Streaming Data into Google BigQuery Using Streamdata.io

We’ve already explored how you can stream data into your data lakes using Streamdata.io, and continuing in this series we wanted to explore how you can stream different data sources into Google BigQuery, allowing you to do large-scale analysis on all your data streams in aggregate. Google has some ready to go functions as part of BigQuery which allow you to stream data into the big data solution, all you have to do is wrap each of the scripts they have available with a Streamdata.io wrapper.

Google BigQuery has the concept of table templates that you can use to manage your table structures, which is something you can leverage on the first step of initiating any Streamdata.io connection. Once the initial handshake is made with Streamdata.io and you get that first JSON response, all you have to do is extract the schema, and either create a new template, update an existing on the fly, or create a new table for streaming into. You can have different table templates representing each unique API path, and its resulting schema, and always handle that first table as part of the first handshake and response from the Streamdata.io proxy.

Streaming data into BigQuery isn’t meant for all scenarios, with the following optimal scenarios:

– Not Transactional – High volume, continuously appended rows. The app can tolerate a rare possibility that duplication might occur or that data might be temporarily unavailable.
– Aggregate Analysis – Queries generally are performed for trend analysis, as opposed to single or narrow record selection.

Making BigQuery another option for developing, evolving, and analyzing large volumes of data that you can stream in via your Streamdata.io applications. We recommend grabbing the Node.js, Java, or .NET BiqQuery samples, and merging with the Streamdata.io Node.js, Java, or .NET SDKs, to connect your streams to the platform. It should give you what you need to build data stores in BigQuery for analysis using Streamdata.io streams, adding to our real-time data stack.

Next, we’ll explore some of the benefits of using BigQuery, and once we’ve streamed some market data, transit data, or maybe some news and blog feeds from a number of sources, we’ll publish more stories here on the blog. We are looking to understand some of the compute and analysis benefits of using Google BigQuery to manage data aggregated from streams, but we also want to learn more about how you can organize, segment, and route your streams into different tables, then tag and group them to help make sense of data in real time. Offloading the compute and heavy lifting associated with making sense of large amounts of data to Google, but doing it in a way that helps you also organize many different stream sources into a massive warehouse that makes sense.