In my book Understanding SEO, I only touched upon the logic of list landing page hashes. More about this simple and powerful concept in this article.
What is a targeted list landing page?
A targeted page is an HTML page that wants to get found for specific search intent, i.e., “cheap apartments Manhattan.”
Somebody searching for “cheap apartments manhattan” does not have one (1) specific flat in mind, but wants to see a list of cheap apartments in Manhattan. To fulfill their search intent, we need to show them a page with a list of cheap apartments in manhattan a.k.a. “query deserves list.” A targeted list page. A targeted list landing page. A list page that wants to get found for specific search intent.
Other example queries that have list search intents:
- “PHP jobs Seattle”
- “vegan restaurants little rock Arkansas”
- “tennis courts Brighton”
- “online yoga classes”
- and so on
What’s the challenge with list landing pages?
The thing is, your web property does probably not only want to get found for “cheap apartments manhattan” but also for “apartments manhattan,” “ground floor apartments manhattan,” “one-bedroom apartments manhattan,” “cheap apartments new york”… and a few 100k or millions of additional combinations.
So what happens if all cheap apartments in Manhattan are on the ground floor, or what happens if all cheap apartments are one-bedroom apartments in your database? The query that populates your list pages has a high chance of returning the same items, meaning the lists have a high chance of having the same main content, a.k.a. internal duplicate content.
What happens if you only have 10 apartments in New York, all of them in Manhattan, cheap, ground floor and with one bedroom. All of these list pages, including the one targeted at the search intent with the highest search demand “apartments New York,” would be the same 10 results.
Internal duplicate content, list pages that visibly show the same lists — regardless of sorting — lead to internal duplicate content, internal SEO competition. As all pages show the same content, Google does not know where to accurately send the traffic to. Google lets them compete against each other to determine the winner (which is not the page that you might have chosen), might even de-index some of them. The result is general underperformance in Google. Internal competition is annoying but no big deal when it happens sometimes. It’s a growing issue (growing underperformance) if it happens on scale, if you have logic in place to create more and more list landing pages over time and do not take internal duplicate lists into account.
What is a list landing page hash?
A list landing page hash is a
- sort independent,
- comparable identifier
- of the visible items
- on a list landing page.
ok, let’s say “cheap apartments manhattan” (let’s call this page P1) lists items
- A
- B
- C
- D
“apartments manhattan” (P2) lists
- A
- B
- C
- D
- E
“one bedroom apartments manhattan” (P3) lists
- A
- B
- D
- C
Let’s take these identities of the listed items (the A, B, C, D, Es) and sort (ascending or descending, doesn’t matter) and concatenate them.
This results in these hashes:
- P1: a-b-c-d
- P2: a-b-c-d-e
- P3: a-b-c-d
This means P1 and P3 are equal. P2 is different from P1 and P3. We can even say that P2 is 20% different from P1 and P3.
Note: you only create list landing page hashes for the first page and the list items visible there, not for paginated pages (as paginated pages (page 2, page 3, …) are never targeted landing pages).
What to do with list landing page hashes?
2 strategies: either quality assurance or choosing a winner.
Quality assurance
You calculate the hashes once per day (most of the time in the same process that creates the sitemap.xml).
The simplest case of quality assurance is to check if there are collisions. Look at the daily report and see if you have collisions. If it’s 1% of the cases (ignore them) or in 20% of the cases (act now). Between 1% and 20% is your wiggle room. I personally see everything above 5% as a high priority area of optimization. Anything above 20% is critical with a high risk of self spam.
The next step is to act based on the daily hash collision report, best explained via an example: I had a case where we had constant list landing page collisions in the “men’s shoes” categories. “Men’s shoes” had a hash collision with its subcategory “men’s sports shoes,” as all of the listed “men’s shoes” were “sport shoes.” We solved this into making sure that we always had one “men’s shoe” offer that was not a “sports shoes” and we changed the listing logic that all categories must list at least 1 item of their subcategories.
It took many iterations, but over time we were able to eliminate nearly all recurring hash collisions in all categories and sub-categories.
Choosing a winner.
The more hardcore strategy is to choose a winner based on the hashes.
In the apartment example, “cheap apartments manhattan” (P1) and “one bedroom apartments manhattan” (P3) is a collision. You can not systematically have these internal competition scenarios (if above 5% to 20% of all cases ) as you would systematically spam yourself with internal duplicate list pages. So if you can not solve it in another way, you will have to choose winners and losers systematically.
- Winners are communicated to Google, listed in the sitemap.xml, and linked internally.
- Losers do not exist, either get deleted or are set to noindex, not listed in the sitemap.xml, and are not internally linked.
As both list landing pages are equal from the content perspective, you must choose the winner via another metric. i.e., shorter URL, more search demand, bigger city, higher margins, …. there is no perfect logic but … “a winner choose you must‘’.
Who should do it?
Targeted list landing page hashes are not for small and medium sized sites.
List landing page hashes are concepts for huge websites that work with a vast and growing inventory of stuff and a growing inventory of pages. Think yellow pages, real estate aggregators, huge online shops, job marketplaces, aggregators.
If you create your targeted lists landing pages editorially, do not bother with targeted list landing page hashes. If you use another logic, i.e., search based list landing pages, metadata multiplication list landing pages (a.k.a. filter pages), expanding category and sub-category logics, then definitely use hashes, at least for internal duplicate lists quality control (to know how big an issue internal duplicate lists actually are for your website). It’s easy to implement when you keep it in mind from the start, it’s hard to do if you do implement it on top of an existing historical grown list landing page logic.