Docusaurus is, amongst other things, a Markdown powered blogging platform. My blog has lived happily on Blogger for the past decade. I'm considering moving, but losing my historic content as part of the move was never an option. This post goes through what it would look like to move from Blogger to Docusaurus without losing your content.
It is imperative that the world never forgets what I was doing with jQuery in 2012.
Everything is better when it's code. Infrastructure as code. Awesome right? So naturally "blog as code" must be better than just a blog. More seriously, Markdown is a tremendous documentation format. Simple, straightforward and, like Goldilocks, "just right". For a long time I've written everything as Markdown. My years of toil down the Open Source mines have preconditioned me to be very MD-disposed.
I started out writing this blog a long time ago as pure HTML. Not the smoothest of writing formats. At some point I got into the habit of spinning up a new repo in GitHub for a new blogpost, writing it in Markdown and piping it through a variety of tools to convert it into HTML for publication on Blogger. As time passed I felt I'd be a lot happier if I wasn't creating a repo each time. What if I did all my blogging in a single repo and used that as the code that represented my blog?
Just having that thought laid the seeds for what was to follow:
- An investigation into importing my content from Blogger into a GitHub repo
- An experimental port to Docusaurus
- The automation of publication to Docusaurus and Blogger
We're going to go through 1 and 2 now. But before we do that, let's create ourselves a Docusaurus site for our blog:
The first thing to do, was obtain my blog content. This is a mass of HTML that lived inside Blogger's database. (One assumes they have a database; I haven't actually checked.) There's a "Back up content" option inside Blogger to allow this:
It provides you with an XML file with a dispiritingly small size. Ten years blogging? You'll get change out of 4Mb it turns out.
We now want to take that XML and:
- Extract each blog post (and it's associated metadata; title / tags and whatnot)
- Convert the HTML content of each blog post from HTML to Markdown and save it as a
- Download the images used in the blogpost so they can be stored in the repo alongside
To do this we're going to whip up a smallish TypeScript console app. Let's initialise it with the packages we're going to need:
fast-xml-parserto parse XML
he, jsdom and showdown to convert HTML to Markdown
axiosto download images
typescriptto code in and
ts-nodeto make our TypeScript Node.js console app.
Now we have all the packages we need, it's time to write our script.
To summarise what the script does, it:
- parses the blog XML into an array of
- each post is then converted from HTML into Markdown, a Docusaurus header is created and prepended, then the file is saved to the
- the images of each post are downloaded with Axios and saved to the
To run the script, we add the following script to the
And have ourselves a merry little
yarn start to kick off the process. In a very short period of time, if you crack open the
blogs directory of your Docusaurus site you'll see a collection of Markdown files which represent your blog and are ready to power Docusaurus:
I have slightly papered over some details here. For my own case I discovered that I hadn't always written perfect HTML when blogging. I had to go in and fix the HTML in a number of historic blogs such that the mechanism would work. I also learned that a number of my screenshots that I use to illustrate posts have vanished from Blogger at some point. This makes me all the more convinced that storing your blog in a repo is a good idea. Things should not "go missing".
Congratulations! We're now the proud owners of a Docusaurus blog site based upon our Blogger content that looks something like this:
Now that I've got the content, I'm theoretically safe to migrate from Blogger to Docusaurus. I'm pondering this now and I have come up with a checklist of criteria to satisfy before I do. You can have a read of the criteria here.
Odds are, I'm likely to make the move; it's probably just a matter of time.