Keys and OAuth

Four problems with screen scraping that an API-First approach solves

screen scraper

Screen scraping (also known as web scraping), like many other computer-generated capabilities, is a tactic used to gather information. Unfortunately, it’s also used by criminals to steal data, as opposed to gathering it. But that’s not the intent of the people who created it.

Budgeting applications frequently use screen scraping technologies to show users where their money is coming from and going to in real time. This process is often benign as it is fulfilling a valuable service. 

So, let’s take a look at where and how web scraping started, how companies legitimately use it, and how others can protect themselves from becoming a victim.

What is screen scraping?

Screen scraping is what a developer might do to get access to information that’s usually only shared via a webpage. The idea of scraping the screen, meaning, programmatically taking what the user would normally see on the screen so that the developer can get access to the data outside of the “application” (web page/web app) in which it’s presented.

The screen scraper uses code to access a webpage just the same way that a user would. The code pretends to be the user in a browser, intercepts the stream of bits, and instead of displaying them in a browser analyzes them to get at the desired information on the page.

Often this is benign. A company that wants to consolidate points and status for a person across all the person’s airline mileage accounts could deliver a point tracking portal. A financial planning company might want access to all a customer’s accounts so that a full financial picture can be seen in one single place.

In fact, this conflict (between customers/scrapers and data-holders/owners) is one driver of Open Banking regulations (like XS2A APIs in PSD2) which try to answer the question of who the data belongs to.

Why does this happen?

It happens because the data is useful and not available elsewhere.

It happens because the companies that have the data only see THEIR OWN POINT OF VIEW, but not the COMPLETE CUSTOMER POINT OF VIEW. As in my definition of digital transformation, they only consider their own process, not their customer’s experience.

Using my examples above… I would have to go to every airline and hotel website to check my point balance, or I can look at my point dashboard. I may want to use multiple banks AND also want to see a complete picture of my financial situation in one place.

Four problems with screen scraping

  1. Security risk. The screen scraper is given the user’s authentication information (by the user) and stores it (hopefully securely) and uses it to access the information provider’s site. In plain English, I would give the company creating my financial picture all the login information for each bank and financial company I use. That is a risk to the financial institution because credentials for accounts they own are stored on someone else’s infrastructure.
  2. Traffic overload. Screen scrapers are “hitting the website” as if they were a logged-in user. However, they are not human, so they can hit the website much more frequently. And they hit it more frequently to stay up to date. Also, they download a lot more information than they need (they need the whole page, including HTML/CSS, and everything present on the page, even if they just want a line item) because that’s all they have access to – pages of data (instead of specific data fields).
  3. Time and money. Companies, especially banks, fight screen scraping with time and people (and technology). One wishes they would simply spend that time and money to create a great API. Though often, they cannot figure out the business justification.
  4. Bad customer experience. I’m sure there’s more but I can quickly think of three issues that impact customer experience:
    • There are errors because it is a hack. If the website changes even a little, the data may not be found until the screen scraper adapts. It’s a constant battle where the customer loses.
    • It’s slow because of #2 above. A lot of data must be downloaded and processed just to get at a few necessary bits. To stay up to date in case there are changes, that data must be downloaded frequently.
    • It stops working because it’s an us-vs-them situation, and the companies are working to prevent this from happening. When those companies are successful, it stops working for the customers.

It definitely creates an “us vs. them” (it’s my data, but I can’t get it… vs the company who holds the data) when many companies are trying to “be in this together” or deliver a “great experience.”

Customer experience should win out

Even though it’s hardest to measure, the customer experience thing might be the most critical driver to move from an us-vs-them attitude towards an open API platform one, even if there are open questions as to measuring the business justification:

I have seen examples where banks create apps (like for FX or treasury management) but do not provide access to the raw data through an API. The customer asks for access to the data, but the bank cannot figure out the ROI for doing so and does nothing. As such, there’s tension until the customer threatens to leave the bank and the bank begrudgingly relents.

That’s not the kind of provider I want to do business with… a begrudging one. Don’t be that partner. Be the one that has a “better together” approach.

What should companies do instead?

Create an Application Programming Interface, or API, with proper authentication. It ​​ allows users to exchange data in a secure and controlled environment, which is one of the core pillars of the PSD2.  

APIs resolve security and customer experience concerns and lower the burden on their web infrastructure (points #1 & 2 above). By partnering with customers to give them access to the data, they can figure out new business models and build better collaborative relationships to identify new needs and opportunities.

The net-new benefits of creating an API platform include:

  • Creating a managed ecosystem to capture value from fintechs and partners. Plaid is just an obvious example, but most innovators would rather not reinvent. Create something of value, expose it as an API, and others will build on top of your offering rather than rebuilding.
  • Enabling solutions that are more valuable for customers because they integrate at a deeper technical level. These solutions are also “stickier” with customers because once the integration is complete, it often becomes a base-layer on which other value is built.
  • Enabling automation. We see a lot of companies talking about digitizing processes and automating repetitive tasks to increase efficiency. That’s just “fancy talk” for APIs. The key thing is that with a platform you empower those who are less technical to create orchestrations to solve their own efficiency aspirations. This last bit is important because Axway research has identified that 86% of IT leaders believe that IT should be spending more time enabling others to integrate for themselves.

Of course, there are technical answers about what should be done. However, more important is understanding the fundamental cultural changes and the required business transformation that drives this new way of thinking about customers, experience, and creating compelling offerings.

Axway can help

Axway has built a team of industry leaders that we’ve called Catalysts to help catalyze exactly this sort of change. The Catalysts work with customers in a variety of workshop formats, from executive to implementors, to help drive change and embrace the future.

Because, after all, it’s not really about your APIs or your API management platform, but about your people and enabling them to connect to customers around the value that you’re creating for them.

Learn how an open API platform can help you improve customer experience and increase revenue.