Usual targets for hackers that clone sites are bigger sites, with lots of backlinks and in most cases, thousands of internal pages (video sites, web shops, news portals and such). That makes the clone very difficult to discover, due to the simple fact that traffic is normally fluctuating and potential backlinks from clone itself will be lost in the big volume of existing ones.
In most cases, clone site (or several of them) will be discovered when a huge traffic drop happens and webmaster actually starts investigating on what could be the cause.
Detection methods for clone sites
- unusual surge in links to your site – check if there is an odd number of backlinks from some site you don’t recognize (usually hundreds of links from same site)
- do a google search for an exact chunk of your content – some sentence that you’ve written yourself and is specific enough that no one else would possibly have, and check google search results for potential intruders
- usual site clones are put on totally unrelated and possibly hacked domain names, so make sure to check your logs and backlinks for any odd listing (example being, your website is tech related and you see backlinks popping from carpet-cleaning-company-something.de or such websites)
Keep in mind, while doing such investigations, you will most likely run into chunks of your content that is scraped by other websites. Site Scraping and Site cloning is NOT the same thing. Scraped content means that someone manually or automatically copied (pretty much copy pasted) bits and pieces of your content on their own website. Scraping is a similar but not identical problem and is usually done to a much lesser extent than actual site cloning, and also, the negative impact on original website is much smaller (usually).
Identifying your website clone
With methods I described above, you’ve probably discovered some urls that shouldn’t be there, so visit them (have an anti malware program active, just to be safe).
Don’t be too shocked to see your own website design, probably even your own logo if it’s graphical. Hackers rarely bother much to change such details.
Now, there is still a chance you’re just looking at someone that stole your website design or just some pages, and did a bad job in covering his traces, so do a further check to see if you’re dealing with a real live clone of your website.
Check if it is a real time clone of your website:
- make a simple text file or add a new post / page if you’re running a WordPress website, with any name really – hello-hacker.txt
- publish it, or if it is a file, upload it to your domain so that the direct link is www.yoursite.com/hello-hacker.txt
- visit that same url on the site you’re suspecting is a clone – www.clone-of-your-site.com/hello-hacker.txt
- if the www.clone-of-your-site.com/hello-hacker.txt actually displays your file in it’s original form, or even with slightly changed content, congratulations, you’ve discovered a clone of your website
Sneaky type of clone site
During my investigations, I’ve discovered some clone sites that, when visited, display a full page of ads. That is due to the simple fact that they feed googlebot with original web site content, and display advertisement for user visitors. There are a few ways to check and confirm that. Simplest one would be to check cache version within google result.
Now that you found the clone of your site, or few of them, you can start taking them down. I’ll cover that in upcoming posts.