PDF as duplicate content good or bad?
Category: tools| December 8th, 2008I recently got across Tabbloid an online tool that automatically receives your RSS feed and converts it into a PDF file and mails it back to you(Thanks to Eoin O’Leary from the Irish Internet Marketing Clinic).
The PDF is really nicely formatted in a two column layout, even graphics are nicely scaled in…
So far so good. OK, but what it’s good for?
So, I got recommended to put it up on the Website as a visitor service (OK, I understand that) and that this should help your website rankings (Huh?).
Well, knowing that there’s no such thing as an explicit “duplicate content penalty”, it was yet my believe that duplicate content would hurt me since the Searchengines had to pick one version and disregard the other versions thus weakening the rankings since the ranking weight would be spread over the various versions. Well in general I was wrong – at least regarding Google. Google actually gathers all “duplicate content” pages/versions and tosses them all into one big pot computing a combined relevance factor and then picks one of the pages to represent that group in the SERPS. (See >Demystifying the “duplicate content penalty”<on Google’s Webmaster Central Blog from Sept. 12th, 2008)
Matt Cutts on the other hand recommends that “…it’s helpful to try to pick one of those articles and exclude the other version from indexing, …” (see his Blog post >Duplicate content question< from Feb. 1st, 2008).
First of all, IMHO these two statements are contradictory. But what do we make of this, especially going back to the topic of this post?
First assume that the Webmaster Central Blog post being the one more valid now, AND that Google really identifies the PDF version as duplicate content of the HTML version and groups them together for the calculation of the ranking value. – That given it shouldn’t make a difference if we’d exclude the PDF from indexing or not. Unless the PDF will receive additional links from outside (maybe also from inside). With additional I mean sites/pages that don’t already link to the HTML version…
Well I don’t know if that’s really worth the risk of the potential splitting the ranking strength over multiple URLs because Google didn’t recognise the PDF as a part of the “Group”. And if so, then it’s still the question if Tabbloid is then the tool of choice for that purpose…
Well, I’m experimenting the other way around, I added some Feeds I want to read but somehow keep forgetting to do so – I’ve to admit I’m a terrible feed-reader… so I’m trying to trick me that way
I’ll keep you updated how I’m doing with that…
December 9th, 2008 at 2:11 pm
Hi,
Thank you for passing on the credit, it’s great to see Twitter at work.
From my experience with PDFs, (but not directly with Tabbloid), I have seen a jump in page rank on the HTML version, when following the Feb 1st guidelines.
1) Don’t submit you articles,(PDFs), for syndication all over the place.
My recommendation, Submit to your key syndication websites only.
In other words, whatever syndication web site give you the best results.
2) Make sure that you include a link to the original content.
I add a linkback to the HTML version to the bottom of every PDF.
When using Tabbloid, which I have only started to test, you may have to edit the PDF to add a HTML link, as the links are all back to the RSS feed.
Regards,
Eoin
December 10th, 2008 at 11:15 pm
[...] Post ist eine Übersetzung meines Posts auf seoxplorer.com: >PDF als “Duplicate Content” gut oder schlecht?< – Eoin O’Leary hat mir hieraufhin bereits einen Kommentar hinterlassen, den ich hier [...]
March 4th, 2010 at 7:41 pm
People should be able to easily save the content of a page downloading a PDF (nice functionality). Hope google takes it into consideration. Does anyone have an actual experience of offering PDF download of content and seen the PR and ranking?