Thursday, 26 December 2013

Using the HubSpot API and CasperJS for Contact Data Scraping

We recently had a client that needed customer data from their web store to be accessible from their HubSpot account. They needed each person who ordered a product to be put in HubSpot as a Contact, along with the customer’s order number, purchase date, price, and a list of products that were ordered.

Typically, a developer would incorporate the HubSpot API into the web store code natively.  In this case, the client’s web store provider is located in a country many time zones away, making it difficult to solve problems outside of basic web store functions. Additionally, the web store platform does not have an available API that would allow us to easily export data in a computer parsable manner.

As a HubSpot and inbound marketing partner for the client, we decided to bypass the third party development firm entirely by writing scripts to scrape data from the web store and send that data to HubSpot. Today, these scripts are hosted on the server and run daily, automatically scraping and importing data from the previous day’s orders.

This method requires two components: a web scraper, and a script that can push data to HubSpot using their new Contact API.

Web Scraper

CasperJSThe web scraper uses CasperJS to authenticate with the web store through a headless browser, navigate to the recent orders screen, and enter date filters. Our only difficulty was working around the antiquated and non-semantic web store markup to programmatically select the correct buttons and tables. In fact, we assumed writing the scraper would be the hardest part of the project, but we were pleasantly surprised by the simplicity and reliability of CasperJS. We chose to output the data in CSV format to standard out, so the data could be piped to a CSV file on the server, allowing a separate script to feed the data into HubSpot.

HubSpot Contacts API

This part ended up being much harder than it needed to be. HubSpot has made a few changes to their API recently, and we were not sure which parts needed to be used and which parts are set to be deprecated. Initially, we chose to use the HubSpot PHP API Wrapper – haPiHP with the Leads API component. This requires that a custom API endpoint be created on HubSpot, which they call forms. Using this API, data can be posted to the endpoint in key-value pairs, which the form will accept and convert into Leads.

Ideally, the scripts run once a day and post data from the previous day’s orders, but we ran into a problem with the initial post. Since the web store does not have an export function, we had to use the script to access all the data from previous sales. After running the script on a few hundred orders, HubSpot informed us that a Leads were being created by sending us email notifications — over 150,000 of them.

Unfortunately, each email contained a Lead with blank data, so the necessary data was not pushed into HubSpot.  On top of that, the API went awry and left our email provider with no option but to queue all emails from HubSpot. We were not able to communicate via email with them for a few days. At first, we assumed that a job had been corrupted on their end and that there would be no end to the emails. After a phone call with the HubSpot development team, we were convinced that the emails would stop and that we actually needed to switch to the Contacts API and away from the Leads API. We also learned that the Leads API is asynchronous and that the Contact API was not, which would allow us to immediately see if the data was posted correctly. Best of all, there is no email notification when a Contact is created through the Contacts API.

In trying to switch to the other API calls, we found two issues. First, we had been using the custom form API endpoint on a number of projects, and it was unclear whether that part of the API was slated to be deprecated.

After some back and forth with the HubSpot dev team, we learned this:

    I would encourage you not to use those endpoints to push data in, unless that data is form submission which you are capturing. If you simply want to sync data in from one DB to the other, I strongly encourage you to use the “add contact” and “update contact” API methods.

    The custom endpoints won’t be going away per se, and there are newer versions of that process in the Forms API, but it’s not really the intended use.

So we will continue using the custom form endpoint to push data in until it stops working … per se.

The second issue we encountered was that, of the two API key generators in HubSpot, one of them does not work with the Contacts API, and the other is hidden. In the client’s main HubSpot portal, you can generate a token by clicking:

Your Name → Settings → API Access

The token provided will not allow the use of the Contact API, and the PHP wrapper returns a message that the key is not valid.

After more back and forth with the HubSpot dev team, we learned that the key required can be found by going to https://app.hubspot.com/keys/get. There is no link to this in the client’s main HubSpot portal which was causing a lot of confusion.

Wrapping Up

From here, the process was pretty simple. A Contact will be rejected if it already exists, unlike with the Lead API. We had to implement a simple Create or Update method which looks something like this: HubSpot Contacts API – Create or Update. Once the two scripts were in place on the server, we set a cron job to run the scraper and pipe the output to a CSV. Once that completes, the PHP script runs and pushes the data to HubSpot.

Source:http://www.sailabs.co/using-the-hubspot-api-and-casperjs-for-contact-data-scraping-474/

No comments:

Post a Comment