How do we look for historical data in OpenAPI parameters?
There is a wealth of data available within the growing number of APIs available online today. As we work to profile new APIs for the Streamdata.io API Gallery and optimize our approach to filtering, searching, and understand what is relevant across a large number of APIs, we are investing heavily in the ability to search of API providers who provide archival data. Many of the customers we are talking with, especially those who operate in the financial sector, are looking for data with 2-5 year history–something that doesn’t always exist in the API world.
We find a lot of interesting APIs, but we are needing to be able to also identify APIs that potentially offer historical data via their API. We can’t be looking at every API manually, to see if they offer archive, it is something we are needing to do at scale, and we find that using OpenAPI parameters is the best way to automate the discovery of historical data. Since we profile all of the APIs in the Streamdata.io API Gallery using OpenAPI, documenting the details of each API, we can easily use the data gathered as part of the discovery of historical data.
There are a number of ways to search for signs of historical data at scale across hundreds or thousands of OpenAPI definitions. One good way to find APIs that might have historical components is to look for the occurrence of “date”, but of course, you need to filter out some come occurrences like “validate”, “update”, and other variances to help keep the results focused and clean. If an API has a parameter for one of its path that has a date, there is the potential that the API can be searched, filtered, and worked with by date. Of course, it will depend on how far back data goes, but an API endpoint possessing a date parameter provides a very good start.
Next, we are looking at other related words that might produce historical results, parameters like “history”, “historical”, “archive”, “past”, “year”, and other parameter names that might be historically relevant. Providing us with a growing vocabulary when it comes to flagging, and then manually checking how far back an APIs data store actually goes. Hopefully, we can automate other aspects of the historical data discovery process, and optimize how we find archival data. For now, just refining our search, and then manually reviewing each individual will give us more insight into the best ways to prioritize which APIs we should be looking at as part of our search.
This effort doesn’t tell us which APIs 100% have historical data. It does give us a head start when it comes to finding them. First, we are discovering and indexing new APIs each day, and the more OpenAPI definitions we find and produce, the wider our search index becomes for finding valuable data. The wider and more precise our vocabulary becomes, the more we’ll learn about identifying APIs that have historical data. Currently, we are refining our historical API search vocabulary, and once we identify a logical set of APIs, we’ll publish the list to the Streamdata.io API Gallery. Providing an easier way to identify the APIs that have the highest Streamrank, as well as possessing historical data that can be accessed as well.
Photo Credits: Carolina Prysyazhnyuk