Streams

Streaming Large Volumes Of JSON Data On GitHub By Switching Across Many Files

We are exploring different ways of using GitHub as an engine for delivering streams of data, pushing forward what is possible when it comes to delivering real-time data using some of the services and tooling we are already using. Our experimentation with GitHub has got us thinking about how the code management service can be used to store large numbers of JSON data, and then deliver them as ephemeral streams of data that people can subscribe to.

Streamdata.io can proxy and stream any JSON data, delivering any updates via Server-Sent Events (SSE) and JSON Patch. Normally this is something that is done via a dynamic JSON API, where the content is being updated via a back-end database or other system. When using GitHub your JSON files are static, but it is a scenario where you can introduce dynamic changes uses Git, or the GitHub API. Something you can further expand upon by house you use GitHub, and “sharding” large amounts of data as separate JSON files stored within a single GitHub repository, multiple GitHub repositories, as well as across multiple GitHub organizations. Triggering incremental updates of a real-time stream by proxying and shifting cross JSON files that are stored on GitHub.

You can’t use Streamdata.io to switch to different JSON paths without terminating and restarting a stream. However, it would take much work to create a standalone proxy that would switch across a known, or unknown number of JSON files across a known or unknown number of GitHub repositories across a GitHub organizations, on a specific schedule and speed. Bridging the separation of many different JSON files that are stored on GitHub, and delivering the data they contain as a continuous stream of data, stitched together in a specific order, and a scheduled series of intervals. Streaming large amounts of data that can be more easily managed and stored across a series of static JSON files, versioned using Git.

Hopefully this example of using GitHub and Streamdata.io to manage data, and deliver data as streams pushes forward your understanding of what is possible with streaming APIs. We want to expand your horizons when it comes thinking about streaming data, and thinking that it always is about streaming data from the current moment. We also want to push forward how you manage large volumes of data, and understanding how you can use GitHub, and Git to manage the data, and the sharding of it across meaningful units of value stored in single static JSON files. Allowing us to orchestrate our data pipelines in different ways, using the tools we are already depending to get the job done as quickly and efficiently as we possibly can.

AI in Finance White paper - common data types

**Original source: streamdata.io blog