Export a blog(not mine) as a PDF document

Is it possible to export the posts of a blog as a PDF?

There is a blog with a couple of hundred posts that I’d like to read on my ereader, so I was wondering whether I could extract those posts into a PDF?

Thanks for the help!

2 Answers
2

This kind of content scraping is frowned on. You’re trying to grab someone else’s content and put it into a transferable format over which they have no control. Yes, your intentions are sincere, but don’t be surprised if several people are hesitant to help.

You do have some options, though.

Contact the blog author

This is the option I’d recommend over all else. Just send them an email and ask for a PDF version of their content that you can put on your eReader. Asking will prevent any questions over whether or not you’re stealing their content. It also lets the author know their stuff is popular enough to publish it as an eBook.

There are several plug-ins around that allow blog owners to export their entire site as a PDF eBook, so it’s relatively quick and simple for the site owner to create a PDF for you.

Anthologize is a great option for doing just this.

Import the site via RSS

If you have an Internet-enabled ebook reader (i.e. Kindle, iPad) you can just browse the site’s RSS feed. However, keep in mind that this feed will likely only include the latest 10 or so articles, not the entire site’s content.

You can also use a plug-in like Anthologize on a WordPress blog to import the content and then export everything as a PDF.

Use an online Service

There are a few online services that will read through and scrape the content of a site to create a PDF. However, they often use the RSS feed of the site and will be limited to the latest posts only.

Zinepal is the latest one I’ve found.

In Summary

The reality of your situation is that there’s no easy, stock way to grab all the content of a site and create a PDF … mostly because it is a frowned upon practice as I mentioned at the beginning. If you wanted to write your own script, though, here’s what it would need to do.

  1. Find the blog’s sitemap
  2. Pull down the content of every page listed in the sitemap
  3. Create a PDF from the fetched content

But this method would be dependent on the presence of a sitemap … which isn’t a surefire dependency.

So, you can either depend on RSS and use an existing service, depend on a sitemap and create your own script, or contact the blog author directly and make a request. I’d recommend contacting the author. Most of us would be thrilled to get such a request.

Leave a Comment