The most common usage of Streamdata.io is to take data streams from common JSON APIs and display them in real time in the browser and on mobile applications–essentially going from server to human, via a browser. However, you can also receive Server-Sent Events (SSE) on the server side, essentially going from server to server in real-time, cutting the human out of the equation. I’m currently streaming over 50 financial market data APIs using Streamdata.io, and publishing the streams to data lakes on Amazon S3 for usage in training machine learning models, and I wanted to share the basic PHP script I am using to get the job done.
Connecting with Streamdata.io on the server using PHP isn’t much different than a regular PHP cURL call, except that the connection will be long running, so you need a separate function to buffer each response, and do something with the results. In this case I’m taking each response, logging the details of each transaction, as well as the JSON response to Amazon S3. Here is a Github Gist, with the long running PHP cURL script for handling the Server-Sent Events (SSE)–minus the Amazon S3 detail to keep simpler.
<?php | |
// prepare headers for API call | |
$request_headers = array(); | |
// prepare the url of the api I am calling | |
$api_url = "http://api.example.com?parameters=whatever"; | |
// append streamdata sandbox proxy | |
$url = 'https://streamdata.motwin.net/' . $api_url; | |
// setup the api request | |
$ch = curl_init($url); | |
curl_setopt($ch, CURLOPT_WRITEFUNCTION, 'myfunc'); | |
// if there are any headers, add them to request | |
if(count($request_headers) > 0) | |
{ | |
curl_setopt($ch, CURLOPT_HTTPHEADER, $request_headers); | |
} | |
$result = curl_exec($ch); | |
curl_close($ch); | |
// this function gets run for each request | |
function myfunc($ch, $data) | |
{ | |
// how big is the data transmission | |
$bytes = strlen($data); | |
static $buf = ''; | |
$buf .= $data; | |
// Collect the details of each transmission | |
$info = curl_getinfo($ch); | |
$http_code = $info['http_code']; | |
$total_time = $info['total_time']; | |
$namelookup_time = $info['namelookup_time']; | |
$connect_time = $info['connect_time']; | |
$size_download = $info['size_download']; | |
$speed_download = $info['speed_download']; | |
$download_content_length = $info['download_content_length']; | |
while(1) | |
{ | |
$pos = strpos($buf, "\n"); | |
if($pos === false) | |
{ | |
break; | |
} | |
// trim things down | |
$data = substr($buf, 0, $pos+1); | |
$buf = substr($buf, $pos+1); | |
// only log if there is something there | |
if(strlen($data)>50) | |
{ | |
// remove data: prefix | |
$results = str_replace("data:","",$data); | |
// Log the details of the transaction to Amazon S3 (or other) | |
// Log the content of the transaction to Amazon S3 (or other) | |
} | |
} | |
// this is important! | |
// won't run if we don't return exact size | |
return $bytes; | |
} | |
?> |
It looks like most basic GET cURL PHP scripts you’d see, except for that added function for handling the stream. It is a little clumsy in how it receives and cleans up the stream, which is something I’ll polish further as I handle more types of responses. I’m just looking to get a proof of concept running, and next I will be pumping up the amount of data I’m running through it, which will help me harden how I handle responses.
I’m essentially using Streamdata.io to stream existing API responses into a data lake on Amazon S3. This particular use case is all about financial market data, but I’m also beginning to do this with social, transit, commerce, media, and other APIs I’m working to profile as part of my Streamdata.io API Gallery work. If there is a specific type of API you’d like to see me profile, or would like to understand how I’m using Streamdata.io to develop real time data lakes for use in machine learning models, feel free to connect, and I am happy to answer your questions. There are thousands of APIs to choose from out there, and I’m just going to keep profiling the interesting ones we have already targeted, unless I hear about something more interesting from you.
Follow us on social