Streams

Using Long Running PHP cURL Script To Write Server-Sent Events (SSE) Responses To AWS S3 Data Lakes

The most common usage of Streamdata.io is to take data streams from common JSON APIs and display them in real time in the browser and on mobile applications–essentially going from server to human, via a browser. However, you can also receive Server-Sent Events (SSE) on the server side, essentially going from server to server in real-time, cutting the human out of the equation. I’m currently streaming over 50 financial market data APIs using Streamdata.io, and publishing the streams to data lakes on Amazon S3 for usage in training machine learning models, and I wanted to share the basic PHP script I am using to get the job done.

Connecting with Streamdata.io on the server using PHP isn’t much different than a regular PHP cURL call, except that the connection will be long running, so you need a separate function to buffer each response, and do something with the results. In this case I’m taking each response, logging the details of each transaction, as well as the JSON response to Amazon S3. Here is a Github Gist, with the long running PHP cURL script for handling the Server-Sent Events (SSE)–minus the Amazon S3 detail to keep simpler.

It looks like most basic GET cURL PHP scripts you’d see, except for that added function for handling the stream. It is a little clumsy in how it receives and cleans up the stream, which is something I’ll polish further as I handle more types of responses. I’m just looking to get a proof of concept running, and next I will be pumping up the amount of data I’m running through it, which will help me harden how I handle responses.

I’m essentially using Streamdata.io to stream existing API responses into a data lake on Amazon S3. This particular use case is all about financial market data, but I’m also beginning to do this with social, transit, commerce, media, and other APIs I’m working to profile as part of my Streamdata.io API Gallery work. If there is a specific type of API you’d like to see me profile, or would like to understand how I’m using Streamdata.io to develop real time data lakes for use in machine learning models, feel free to connect, and I am happy to answer your questions. There are thousands of APIs to choose from out there, and I’m just going to keep profiling the interesting ones we have already targeted, unless I hear about something more interesting from you.

**Original source: streamdata.io blog