How to Identify Low Hanging Fruit For APIs From Data On Your Website

In our partnership with the API Evangelist one of the services we deliver is about identifying the low hanging fruit for APIs deployment from data that already exists on any website. We find that many companies are eager to do APIs, but don’t always know where to start, while also simultaneously possessing large quantities of valuable data on their websites. Which we consider low hanging fruit, because if data is already being published to the website, which means it has value, and there is already a need for this data–making it a perfect candidate for turning into to a web API, and complimenting your existing web presence.

Web APIs are just the next evolution of the web, with the first phase is about delivering data and content to humans in the browser, and the second phase being about delivering the same data and content to other systems and applications using APIs. To help our customers identify the low hanging fruit that exists on their website, we have a pretty simple script that we can turn on and run for as long as we need. To get started, we seed the script by giving it the root URL for the website we are targeting, which it then:

– Parses every URL on the page and stores them in a database.
– Counts every table, and the number of rows that exist on the page.
– Counts every form that exists on the page.

The script then iterates and repeats this for every URL it finds on the page. It ignores any external URLs, and only spiders URLs that are within the seed domain. It also ignores images, word docs, video, and many other common objects that exist on the website. However, in addition to counting the tables and forms, the script identifies, sets aside, and counts:

– Every CSV file
– Every XML file
– Every JSON file
– Every XLS or XLSX file

After running the script for several days, the result is we have spidered every page within a website and identified every table, form, csv, xml, json, and spreadsheet file. This script and resulting process produce the low hanging fruit we are looking for from any existing website. The resulting report paints a pretty interesting picture of any company, organization, institution, or government agencies websites. A process that ends up uncovering some pretty compelling stories about how a company conducts itself online.

Once we are done, we produce a list of all the potential data targets, with an accompanying executive report on what we discovered along the way. Ideally, everything we identify on a website will become an API, but the reality of the situation is, most companies won’t have the resources to tackle immediately. Our objective with this service is to help organizations realize how much valuable data and content they are already publishing, and help them in their API journey. The further along in their journey they are, the more likely they are going to need Streamdata.io services, and we are in the business of helping our customers be successful, as well as being able to sensibly use the real-time and event-driven API solutions we deliver.