How To Automate Keyword Research With APIs & Python Scripts


Introduction:

It’s time for some more SEO tips!

The latest ‘Words from the Wise SEO’ blog post is an interesting read from Paul DeMott – about how you can cut your keyword research down to just ten minutes! Impressive.

Find out how you can start automating your keyword research to save a bunch of time that you can spend wisely elsewhere.

Leave us a comment if you have any points to add!

 

[thrive_leads id=’52458′]

 

How To Automate Keyword Research With APIs & Python Scripts

Introduction

My name is Paul DeMott and I’m the CTO of Helium SEO, a fast growing Cincinnati SEO company. I’ve been in the SEO game for over a decade and over the last few years I’ve been specifically focused on building software and AI for SEO. Today I will share with you one of the key processes we’ve automated.

Disclaimer: Our agency has access to many tools that are not available to everyone. There are many free substitutes for these tools and I hope to update the article with these as I work on replicating this system with free tools.

Automating keyword research is a tall order, but if you are able to do it, it can save hours of work. We took our process from 2 hours down to 10 minutes by strategically programming parts of the process. What makes it tricky, like so many parts of SEO, is that there is a large subjective component. Computers rely on instructions and rules to make decisions, and a keyword that can look great to a computer can look bad to a human.

Our team spent a lot of time brainstorming as well as making revisions to this process. I will do my best to walk you through the whole process.

Our goal was to build software that could find keywords that had the following characteristics.

  1. High buyer intent (Not only will it drive traffic but this traffic can produce leads. Also notice keyword volume is not the priority)
  2. Trending upwards (we expect this query to get more searches over time)
  3. Low difficulty (The easier it is for us to rank and get clients leads the better)
  4. Fits well into our clients strategy and the products they serve

 

Do not use a process like this if your client already knows their keywords.

 

Sometimes keyword research is easy and there are a few keywords that can be knock out without revving up our software. For example, sometimes our clients have run Adwords for years and have statistical significance on which keywords specifically drive leads. If your client has this it should be a no brainer which keywords to pick, but often this data is not available and that’s when you need to dive a bit deeper.

Quantifying the Keyword Goal

The true challenge of this project, as mentioned above, is successfully quantifying what a good keyword looks like.

Through trial and error we’ve found there is no single tool that works best for keyword research.

Each tool has its own strengths, but to be able to utilize their strengths requires understanding what they attempt to do.

In the following sections I’ll elaborate on which metrics we ended up selecting and why. I’m a strong believer that understanding the why is necessary to implementing any system.

Ahrefs Keyword Difficulty

Ahrefs KD claims to have the highest correlation to actual search results. (Keep in mind that Long Tail Pros algorithm has significantly improved since this post was created.)

Ahrefs KD metric strongly correlates to real results because backlinks still carry a huge weight as a ranking factor. KD works by averaging the number of referring domains to a page inside the top 10.

This is helpful, but also limiting. On one end, assuming links of equal quality, this score can give you a good estimate of the number of incoming links required to get to page 1. However, in many cases it’s not accurate to assume that all incoming links are of equal quality.

Despite this keyword having a low difficulty rating, it might be easy to get onto page 1, but another story to get into the top 3 results.

In this example, it might be easy to get on page 1 but very hard to get spot 1. Additionally, some of the most competitive keyword searches get assigned a low difficulty score because there are few incoming referring domains; however the sites ranking on page 1 are massive authority websites that pass around a lot of power with strong internal linking schemes.

Long Tail Pro KC

Long Tail Pro KC is a complementary tool to Ahref’s KD and DR.

At a high level look, Long Tail Pro KC factors in domain authority, incoming links, keywords position in titles and their respective majestic trust flows, and then weights each of these values to arrive at a KC score. Our agency uses the chart below to interpret the KC value and we’ve found this chart to be very accurate from the 1000s of keywords we have ranked.

Unlike Ahref’s KD, which doesn’t give us specifics about the quality of links, Long Tail Pro’s KC does.

Additionally, one of my favorite features of Long Tail Pro is that it can provide insights into internal linking and site age.

By clicking on the word you can pull up a more complete view of the SERPs for any given keyword.

SEOs often overlook internal links even though they can carry as much power as referring domains.

PRO-Tip: Pay attention to the site ages in the top 10. One or two young sites in the top 10 can be a good indication that you can rank for the keyword quickly.

Ahrefs DR

Domain Rating serves one purpose in our keyword process. Sometimes you’ll encounter keywords that look great but in actuality would be near impossible to rank for. With DR we can catch these false positives.

For example the following scenario the domain has a KD of 0, but when you look at the DRs the keyword is much more challenging.

Why does this happen? A very authority site passes incredible amounts of link equity to every site it links to, including internal pages. This article is worth reading to better understand it (link here).

Why DR instead of DA? Ahrefs simply has a much better crawler than MOZ. It’s second in size next to Google.

Checking the top 3 Spots

Unfortunately, KD and KC just tell us the difficulty of getting onto page 1. If we want to know how hard it is to rank in the top 3 spots we need to look at the DR of these sites as well as their referring links.

We have not yet automated this process, but will in the future.

Automating the Process

Now that we have the metrics defined that we are looking for. Let me walk you through the automation process.

 

Tools required:

  • SEMRUSH
  • KEYWORDTOOL.IO
  • Selenium
  • Pikotocharts

 

Phases

  • Keyword Expansion
  • Culling
  • Graphically representing
  • Manual Review Stage

 

Keyword Expansion

Keyword expansion is one of the most important pieces of this process. The good news is, even if you are not a developer you can still use this process, and while it may not take 10 minutes and be done automatically for you, it is still a systematic way to find good keywords.

During the expansion phase the most common issue we ran into was not casting a wide enough net to catch enough good keywords. We tried many different tools but here is our finalized process.

To programmatically do these processes we use a browser automation tool called Selenium. It functions just like a human would on each page. (If there is interest on how to use this tool I’d happily write another article.) Lastly, if you have API access to any of these tools it speeds up this process.

  1. Collect seed keywords from customer. Make sure you have a procedure for learning which keywords your customer wants.
  2. Type keywords into Google and grab top 5 results. We use a python Google API to automatically grab these 5 pages.  
  3. Pull list of keywords between 1-20 from all 5 competitors in Semrush.
  4. Place seed keywords in keywordtool.io. This tool is great because unlike other tools it grabs a high diversity of unique related keywords. They even have a free version.


  5. Combine the list generated from Keyword.io and the 5 Semrush competitors. Just save the search volume and keyword name.

Curating the List

Now that we have a list of close to 1000 keywords we begin trying to find the needle in the haystack. (Note: We have written an internal script to apply the following changes. Python has a few nice libraries for dealing with csv data.)

  1. Cut all keywords with less than 10 searches per month.
  2. Copy list into LongTrail Pro. Cut all keywords that have a KC>40, save CPC and whether or not site links appear for the search query
  3. Cut all keywords that have a KD > 20
  4. Sort KD by Descending Order – The idea here is that we are looking for the least competitive keywords that still meet our qualifications. We want to prioritize keywords by lower difficulty ratings.

Check Google Trend Data

We built a tool that can take our final list of keywords and see whether they are trending upwards in popularity and then plot all of our relevant data. (I’ll be updating this content with a download link soon.)

Plot

Lastly, we take all of the keywords and plot them on a 2-dimensional graph.

These plots are much more aesthetically pleasing to look at than excel tables and they make it easier for a human to identify which keyword fit a list of criteria.

While it may be easy to sort an excel sheet by a single column, such as search volume, it’s more challenging to compare a ton of fields such as high search volume, low difficulty and is trending upwards in popularity. These visualizations simplify the process.

Visualize the Information

Red = SERPS had Adwords Ads

Blue = SERPS did not have Adwords Ads

Circle Size = The keyword Volume

In this chart we express buyer intent by identifying whether or not the keyword had site AdWords links on the SERP. It’s not a perfect method but it at least helps pick out good keywords. The x-axis is keyword difficulty and the y-axis shows the sloping trend of the keyword (whether it has become more or less popular).

Lastly, if we hover over a keyword we can see specifics. I also have the Semrush difficulty metric displayed, however I don’t find it very useful.

The ideal keywords have the following traits:

  1. Large circle
  2. Red Color
  3. Low difficulty
  4. High CPC
  5. Positive Slope (upward trending)

Final Step

The final results of this process are 10-100 good potential keywords. Once we have them we throw them back into Long Tail Pro and Ahrefs to do one final check. This is the point where we double check for false positives.

Additionally, if the DR is above 45 (our metric to differentiate an authority site from a normal site) for every domain in the top 10 and our client doesn’t have a DR of 45 we may put these keywords on the back-burner.

They may still be possible to rank but, they would likely take longer for us to generate leads for.

I hope this process is helpful and would love to hear your guys’ feedback!

Thanks for taking the time to read the SEOButler blog today – how has this post helped you?

Check back next week for more!

 

[thrive_leads id=’52458′]

 

If you want to see your blog featured here, get in touch – we’re always looking for new contributions. Send in your pitch via our Contact Page your post could be next!

Share this post

# Type at least 1 character to search # Hit enter to search or ESC to close

# Type at least 1 character to search # Hit enter to search or ESC to close