We are continuing to roll out a number of AWS Lambda functions for connecting to a variety of APIs which then allow you to stream the results into your existing AWS infrastructure. Next on our list of serverless streaming connectors to deploy, is one for streaming new Hacker News posts into an AWS data lake, allowing you to train machine learning models on any link submitted to Hacker News, as well as use in a variety of web, mobile, or other applications. Providing a plug and play, scalable way to stream data available via existing APIs into your organization’s data lake.
The Streamdata.io AWS Lambda streaming connector for new Hacker News submissions is available in the AWS Serverless Application Repository. Allowing you to deploy the serverless function within your existing AWS infrastructure, proxy Hacker News API calls using Streamdata.io,
To run the functions you’ll need a Streamdata.io account and application key–something that takes a minute to set up. You’ll also need an AWS account to deploy the Lambda function into, and your S3 storage activated to establish your data lake. The Hacker News API does not require an API key or rate limit their API, making it easy to begin streaming data into your data lake. Streamdata.io will proxy and cache the Hacker News API, sending only updates to your AWS Lambda function which then publishes the incremental updates to your designated S3 bucket based upon the schedule you set up, using AWS Cloudwatch Events. The Hacker News API has a strange design pattern, which requires additional API calls for each individual post retrieved from a new API call, but the Lambda function handles taking each unique ID and making the separate API call for you.
The next version of our functions will abstract away accounts and keys needed for Streamdata.io, making your AWS account the only thing you need. However, this function should get you started streaming new Hacker News posts into your data lake. Allowing you to monitor conversations that occur via the tech social news site. Enriching your data lake with relevant signals, which can be used to train machine learning models, drive dashboards, web, mobile, and any other application you need. Efficiently tapping into valuable 3rd party data sources like Hacker News to find the signals that matter, and make them available for use across your existing infrastructure in the AWS cloud.