We are beginning to roll out a number of AWS Lambda functions for connecting to a variety of APIs which then allow you to stream the results into your existing AWS infrastructure. Next on our list of serverless streaming connectors to deploy, is one for streaming Twitter searches into an AWS data lake, allowing you to training machine learning models on any Twitter search, as well as use in a variety of web, mobile, or other applications. Providing a plug and play, scalable way to stream data available via existing APIs into your organization’s data lake.
The Streamdata.io AWS Lambda streaming connector for Twitter searches is available in the AWS Serverless Application Repository. Allowing you to deploy the serverless function within your existing AWS infrastructure, proxy Twitter API searches using Streamdata.io, and publish the results in real time to an AWS S3 bucket. Turning on a scalable faucet of data from a few, or as many Twitter searches as you’d like to conduct, and orchestrate the streams based upon events or a schedule using AWS Cloudwatch Events–which lets you manage each function, and pay for only when the streams are running.
To run the functions you’ll need a Streamdata.io account and application key–something that takes a minute to set up. You’ll also need an AWS account to deploy the Lambda function into, and your S3 storage activated to establish your data lake. Then you need a Twitter account, and token to be able to make ongoing calls to the Twitter API. Once set up, all you do is add your Streamdata.io key, and Twitter token into the Lambda function, execute the script, and it begins streaming into your S3 bucket. Streamdata.io will proxy and cache the Twitter API, sending only updates to your AWS Lambda function which then publishes the incremental updates to your designated S3 bucket based upon the schedule you set up, using AWS Cloudwatch Events.
The next version of our functions will abstract away accounts and keys needed for Streamdata.io and Twitter, making your AWS account the only thing you need. However, this function should get you started streaming Twitter searches into your data lake. Allowing you to monitor conversations that occur via the popular social network, across any topic you choose. Enriching your data lake with relevant signals, which can be used to train machine learning models, drive dashboards, web, mobile, and any other application you need. Efficiently tapping into valuable 3rd party data sources like Twitter to find the conversation that matter, and make them available for use across your existing infrastructure in the AWS cloud.