Streams

Streaming GitHub Repo Searches With A GitHub Pages Application Using The GitHub API

We’ve been developing more serverless applications that run on AWS lately, which let you stream data from a variety of API providers into your Amazon S3 data lake(s). Continuing with the same theme of building data lakes from 3rd party APIs, we wanted to leverage the GitHub platform for building data lakes by streaming data from 3rd party APIs, as a simple JavaScript package that runs entirely on GitHub–pushing the boundaries of what is possible with APIs, GitHub, and data lakes. As part of this work, we’ve published our first streaming topic subscriptions using GitHub repository searches to GitHub. This JavaScript micro application runs on GitHub, hosted by GitHub pages, and leverages Jekyll, HTML, CSS, and JavaScript to deliver the application interface. The micro application connects to the GitHub API, allows for five separate searches that are defined using OpenAPI, which are then proxied using Streamdat.io delivering Server-Sent Events (SSE) streams for each topic, which gets displayed on the screen at the bottom of the page. Providing five separate topical streams from GitHub, that can be triggered and monitored via the browser on the GitHub Pages hosted web application.

This application requires being authenticated with GitHub to fetch the search results from the GitHub API, but also uses the GitHub API to save the results to the repository. Something that opens up an entirely new way for developing data lakes, that can then be put to work using Git, or with the GitHub API. Streaming in data from a variety of 3rd party API sources, and storing it within private GitHub repositories, for use in a variety of applications, and the training of machine learning models. Transforming GitHub into not just a data lake, but the source and the destination of real time streams of valuable data, and changing the way we look at how we move data around.

Right now the prototype will only stream one topic at a time, but it should demonstrate the potential for streaming from GitHub using their API, Streamdata.io, and GitHub. If you have any questions about the prototype, feel free to submit an issue for the GitHub repository. We have been working on deploying a Stack Exchange, Twitter, and Reddit search versions of the same prototype. We will keep publishing collections of topics, publishing streams of data within intended areas, which is something we’ll eventually open up to wider search capabilities in future versions. This streaming GitHub topic subscription prototype is not production ready and just meant to demonstrate the potential of streaming APIs on GitHub. If you are looking for a specific implementation or would like to obtain a more stable version of this micro application, please let us know.

AI in Finance White paper - Github api

**Original source: streamdata.io blog