One of the things an SEO-tool does is find reference from top google results and show frequent used words. It’s somehow one of the formula that Google use (the biggest search engine, in case you didn’t know :P) to determine if your new written content relate or worthy become the new rival for these other top contents.
These keywords turned into signal to Google on what contents are relevant for user.
Reverse engineering content research SEO tool
For this case, I want to replicate what tool like “SurferSEO, Frase io, PageOptimizerPro or similar tools” did. I will share the overview step by step, not the actual code (maybe I’ll share it later on another post)
Disclaimer: They have many features outside what I mentioned and we’re going to cover. It’s just one of the feature they have :)
The PseudoCode for content research SEO tool
Here is what I come up with:
- Get top 5 or top 10 (or up to you!) google search results (By Scraping directly or via SERP API)
- Collect the list of URL (so we can scrape and analyze the content later)
- Loop this URL and scrape each
- Collect the headings (h1, h2 and h3)
- Get the main content
- Count how many words in main content
- Find most frequent used words (10 -15 words), collect both, the words and count
- Use Natural language processing (NLP) API or library, to find synonym of the words (to create variation)
Usage flow Idea
- User input keyword on
- If you changes country-search often, then put dropdown for country/languange list. If not you can set the country once in backend
- Submit keyword, ajax request to actions above (in pseudocode)
- (Optional) you can let user choose which URL to scrape first, before continute to scraping
- Show all the results
- Display h1/h2/h3 list as inspiration for your own content
- Show count words/length for each article
- (Optional) you can make average for each word (by sum all the words devide with how many url you scrape)
Even though not perfect, you can use:
- The most frequent words + how many times it appread
- How many words/length per article shows.
As your basis for creating content. You can aim to have these keyword on your article.
- When scraping Google, make sure it’s for correct country (domain) and language, since each country and language provide different results.
- Set the user agent header to wanted device (mobile or others), depend on what you want
- If you build a UI (user interface), after getting the search result, optionally you can let user (you), select which URL you want to scrape. For example sometimes, Google show a result from Youtube or Twitter which is not an actual blog post.
Before scrape each of the site, you can:
- Get the title and meta description as inspiration (we can use and analyze later)
- Get the related questions on Google (so you can put this as part of your article)
- Get keyword completion (to use as long term keyword)
More tips on Google SERP: Find hidden gems in google search result for SEO
Challenges on creating SEO reserarch tool
These are challenges I met along the way
Blocked by site
When scraping a site, probably that site already has a blocker for scraping.
- Using header agent to simulate you are not a robot.
- Using proxy ip address (there are many service/API for this)
Get main Content
Get the correct main content is not very straightforward, since websites have different layouts and not every site using HTML main tag.
For this, you can find “article scraper”. Like:
- newspaper3k in python
- Zyte API
Find the benchmark here
Frequently asked question
How can I get most frequent keywords ?
After getting the main content (do that first), you can split each word by (” ” / empty space). And loop throgh it while collecting each of the word in array (count ++ if it appears n+1 times).
Who is this article for
For developer who loves SEO world. For marketer who wants to build their own SEO tool, with help from SEO friend/freelancer)
What is SEO (search engine optimization) ?
It’s how we can bring our websites into top results or page one at search engine like Google. There are many SEO strategy we could pursue, on of them is by providing search engine with content that similar to current top results + your new unique perspective.
What other features SEO tools have?
If you want to explore more + DIY your own SEO tool, here’s what they normally have:
- backlink analysis
- search volume for each page or competitor’s page
- connect with Google analytics or Google search console (Google has API for this)
- Get serps results (search engine results)
- Get what’s trending on search engine
If you’re interseted on exploring this, you could ask fellow content marketer / SEO expert what tools they wonder to exists.