Gatsby Multi Website Deployments

AppFoundry, mrt. 27 2020

Within our company we started using GatsbyJS last year to generate web applications. Gatsby allows us to build blazing fast websites with React.  

In one of our projects, one of the business rules was that all data is statically available; thus no API calls were allowed. This means all the data for all pages has to be prepared during the build process. During our load testing we quickly ran against the limits of a GatsbyJS and a single NodeJS process. As the data grew the GatsbyJS process started using enormous amounts of ram (8GB+) eventually going over the limit of our server hardware.  

This was to be expected, but how do solve this problem?

Splitting up the build process

The idea that we came up with was splitting up the build process. Splitting up the process reduces the RAM usage for each sub process by quite a factor.  

Sadly this is not something supported by Gatsby by default. Even further, Gatsby doesn’t allow you to change the build output folder and every new build would erases the previous build. This challenge is something not only we faced, but is an open issue on the Gatsby Github. 

Luckily Gatsby does offer quite an extensive API and allows us to hook in almost every part of the build process. The onPostBuild hook is triggered once all parts of the build are done and allows us to modify the output folder in the following way

export const onPostBuild = (): void => {
	try {
		// Copy public folder content to build
		copyFolderRecursiveSync(publicDir, buildDir)
	} catch (error) {
		console.error('Error during build replication hook', error)
	}
}

At the end of the build process we copy the contents of the public folder to a new directory. Once a new build is started the public folder gets erased but our own managed output folder persists through all the processes.  

An important decision when splitting up the build process is how many times the build process is split up. We decided to split up the process in 2 ways; main slugs and its sub slugs. Eg. The main slug is a region and it has countries as sub slugs.

Main slugs

The main site slugs (eg. a region) are the starting point of a build process. A main site slug is deployed and published separately to the CDN (eg. S3 with CloudFront). The build process of the website is started with a custom script instead of the default Gatsby build command.

const mainSiteSlugs = getMainSiteSlugsFromArgs()

const buildMainSlugSite = slug => {
	const buildWebsite = spawn('node', ['build-subwebsite.js', slug])
	
	buildWebsite.stdout.on('data', data => {
		console.log(`OUT [${slug}]: ${data}`)
	})

	buildWebsite.stderr.on('data', data => {
		console.error(`ERROR [${slug}]: ${data}`)
	})

	buildWebsite.on('close', code => {
		console.log(`Main site [${slug}] build process finished with code ${code}`)
		if (mainSiteSlugs.length) {
		buildMainSlugSite(mainSiteSlugs.shift())
		}
	})
}

buildMainSlugSite(mainSiteSlugs.shift())

Each main slug starts a new JavaScript process; which will start building all the sub websites. Once a main slug build is finished it will go recursively over all the remaining main slugs until all main slugs are done. 

This process is synchronously for two reasons:

We don’t want to go over the ram our hardware has available. For this project we also had no budget to horizontal scale up the hardware.

If we did do it asynchronously and also deploy asynchronously it would add the extra complexity of which main slugs are already deployed. If we have an overview page of the the main slugs, we would need to know which slugs are currently available.

Sub slugs

The sub website slugs (eg. countries) are batched together in groups. Each batch of sub website slugs starts a new Gatsby build process.  The reason for batching the sub website slugs is because the Gatsby build process needs bootstrap time. Doing the build process in batches significantly reduces the total time of bootstrapping.

const R = require('ramda)
const COUNTRY_BATCH_AMOUNT = 20

const buildSubsite = countriesLeftover => {
	const currentCountryBatch = countriesLeftover.slice(
		0,
		COUNTRY_BATCH_AMOUNT
	)
	const leftoverCountriesForNextBuild = R.difference(
		countriesLeftover,
		currentCountryBatch
	)

	const buildWebsite = spawn('npm', [
		'run',
		'build',
		mainSiteSlug,
		JSON.stringify(currentCountryBatch)
	])

	buildWebsite.stdout.on('data', data => {
		console.log(`${data}`)
	})

	buildWebsite.stderr.on('data', data => {
		console.error(`${data}`)
	})

	buildWebsite.on('close', code => {
		console.log(`Sub site build process finished with code ${code}`)
		if (leftoverCountriesForNextBuild.length) {
			buildSubsite(leftoverCountriesForNextBuild)
		}
	})
}

buildSubsite(allCountriesToBuild)

Each batch of sub slugs start a new Gatsby build process and does this recursively till all sub slugs are built. Once all sub slugs are finished it calls back to the main slug process, so it can start building the next main slug.

The process

Get all main slugs from the data package
For each main slug
1. Start main slug build process
2. Batch the sub slugs and start the Gatsby build process
3. Upload build output for the main slug and all it’s sub slugs to the CDN
Build and upload the index pages (eg. the region selection page).

Conclusion

Gatsby doesn’t have a feature by default to split up the build process but it does have an extensive API to hook in the build process. We used the onPostBuild hook to customise the build output folder and split up the build process from there by main and sub slugs.