I’m sure we’ve all seen our fair share of migrations in our careers. Some more successful than others, no doubt.
If you’ve struggled to pull off a migration, I’m here to to tell you that migrations are hard and you’re not alone.
It’s easy to start out thinking it’s going to be a straight line to the end where everything’s going to be better right away.
Photo by Derek Thomson on Unsplash.
In reality, of course, it’s often a lot more like this. An uphill path full of twists and turns and surprises.
Photo of Trollstigen in Norway by Ivars Utināns on Unsplash.
Unforeseen hiccups and bends in the road; interruptions that take your focus away; a seemingly short distance that takes a much longer time to cover than you expected.
I’m going to talk to you about the bends in the road that we faced embarking on a big technical migration, how we navigated them, and some of the lessons that we learned.
And how to know whether you should be going down this road at all.
My name is Sophie, I’m a senior web developer and web discipline lead at Monzo.
I have a website where I occasionally write about tech and mostly just share recipes I like.
You can find me in various internet places:
Monzo is the hot coral bank that lives on your phone. But we have web things too! This is one of them, it’s monzo.com - our very pretty marketing website, where you can find information about all our products, plus our help articles.
We’ve also got a web banking app for our business customers. Also very pretty, and lets our business customers do things like view transactions, manage pots and make payments.
Then there’s our internal tooling. We have a suite of internal web apps, the biggest of which is BizOps, our customer support system.
BizOps enables our customer service team do everything they need to - chat, email and talking to customers on the phone all happens through BizOps, plus any of the things they may need to do on someone’s account to help them out. It’s a big back office system. Where many other banks have lots of different systems and software for this, we have one for most cases. As you can imagine, it’s a very complex beast.
We recently migrated our web properties to Typescript. It took ages. This is actually the sticker we got made after we finished.
It actually took us just over two years, but spoiler alert, it came out pretty well in the end. Needless to say, migrations are not without challenge, especially in an organisation of our size.
Before Typescript we were using Flow. Flow is Facebook’s type system, but Typescript was a lot more popular which meant that most of the libraries we use didn’t have flow types in newer versions. It made it a lot harder to keep things up to date. We were stuck using an obsolete technology, pretty much.
The server couldn’t cope with the size of our monorepo, and crashed all the time.
Our internal apps are Next.js-powered, and we couldn’t use their speed Rust-based compiler because we were still using Babel.
It was something we talked about on and off for a while - along the lines of “yeah, we should probably do this at some point”. Acknowledging it was starting to get too difficult to upgrade some of our tooling because of Flow.
Migrating our web properties to Typescript wouldn’t be as simple as just running a codemod across the whole repo, though. For one thing, there’s just so much of it.
It can seem really intimidating when you’re right at the beginning and you feel like you’ve got a massive job ahead of you.
The best way to approach this is to start small.
In our case, a couple of web engineers had a go at seeing whether a pre-existing migration script would work at all for us, as a side project. They found a couple of options on Github, forked them, and had a bit of a hack, seeing if they could migrate one library.
They found something that worked, so the next step was to see if they could get the Typescript package to build alongside all our existing Flow code, locally and in CI.
Flow to Typescript is a path well trodden at this point and in fact a while after we started, Stripe open-sourced their flow-to-typescript migration code. With any technical migration like this it’s always worth seeing what other people have been up to to see if you can save yourself some time.
You want to prove that this can work before you commit yourself to doing anything else.
If you try and get buy-in before you’ve even started looking into a solution, you run the risk of being shut down before you’ve been able to do anything. Without a really clearly defined plan and timeline, you’re asking a load of folks with very little context to let you focus on what they see as a less important maintenance piece over prioritised team work.
It’s really important to have that space to experiment with something before you need to go and make a case for it.
Timebox the exploration effort, and communicate your progress clearly. Keep a log of any decisions that led to you choosing a particular path or technology, someone in the future will thank you for it, whether they’re doing something similar or trying to figure out what the thought process was behind the implementation.
Speaking of which, document the hell out of everything you do. Right from the beginning.
For every new engineer who joins your team, you’re going to need to get them up to speed with what you’re doing and why you’re doing it.
I left Monzo in 2021 and came back in mid-2022. In order to start contributing to the migration I needed to get up to speed with how it all worked and what the approach was. Our dedicated Notion space came in very handy.
Your documentation should cover every aspect of the migration, past, present and future. Treat it like a project. Your projects are all meticulously documented, I assume.
Someone with no context should be able to read the document and understand exactly what’s going on and who to speak to if they have any questions.
Once you’ve decided to go ahead, who does the work? Should you have one team working on the migration, or collaborate across different teams?
In an ideal world, you’d have a team whose job it is to do the migration. They’d laser-focus on that, and nothing else, for the duration of the project. For one thing, that vastly reduces the risk of the work not getting done.
If you share the work out across multiple teams, this removes the need to ring fence a team for the duration of the migration, but you can expect things to take a lot longer as teams’ planning cycles vary massively and this kind of work will never take priority.
The bigger your organisation gets, the more complicated this becomes. If you’re driving the migration centrally you can just do it for them.
We went for something in the middle. We centralised the work amongst our web team, as we realised that we’d never be able to get all the product teams to migrate things to Typescript, plus it was definitely going to be safer if done by a web developer.
However, we shared the work across the web engineering discipline, starting out with a small group of interested volunteers. We knew it’d probably take a while.
How should you tackle the work itself? Do it all at once, or bit by bit?
When you’re doing the work shared across multiple teams you don’t have a lot of choice here - you have to do it incrementally. But if you have the luxury of a dedicated team doing the migration, you can choose.
Some things have to be done in one go, such as major version upgrades of things like React, because you don't want multiple instances running.
If you’re replatforming your website on a new static site generator or something, you may choose to do it all in one, rather than fiddling around with multiple generation steps or even different hosted versions of the site. That makes sense.
For much bigger codebases, a big-bang migration is certainly quicker, but inherently riskier.
If there’s an incident it makes it a lot easier to identify what’s gone wrong and roll back when you’re going bit by bit. It’s much safer approach.
Our web projects live in a monorepo. We share technologies across most of our web apps. Everything is in React, with some apps on Next.js, and until recently all of our code used Flow for static typing.
We’ve got a lot of shared libraries that all live in the monorepo. Changes to one library can affect multiple apps, and multiple parts of the same app.
Many of those apps are also business-critical. We couldn’t afford to just migrate the entire BizOps app and yeet it into production when our ability to service our customers depends on it. We knew we’d have to be extremely diligent and test everything very carefully.
The flip side of this is, you have to be disciplined and make sure you actually keep going. It can be very challenging to maintain enthusiasm and pace, as we discovered.
There’s a real risk here that you go bit by bit, something else comes up, and you put it on pause - only to never resume it.
Of course, like with any migration, we hit a bend in the road pretty soon.
When contributing to the repo, other engineers would copy and paste old Flow packages to create new ones, increasing the amount of Flow code when we’re trying to get that amount down.
We needed to make it much easier to build new things in Typescript. So easy that it was the obvious choice. We did that with tooling.
We wrote a package generator that would bootstrap a Typescript library or component with all the things you needed. This removed a lot of the manual steps that engineers struggled with, and all the new things coming in were typescript.
Another thing we could have done here is added a static check to make sure we’re not increasing the number of Flow files in the repository. Something like Semgrep, causing CI to fail and block deployments.
We’ve used ESLint in the past to make sure people aren’t using libraries and methods we’ve deprecated but not fully removed.
I can’t stress enough how important tooling was. If we’d done it all manually it would’ve taken twice as long.
A couple of our engineers wrote a script to migrate a package from Flow to Typescript. After running that we’d give the output a manual pass afterwards to fix any any
s and any outstanding type errors.
They documented it meticulously because they’re good people. We also had a space for documenting any issues that people had come across, and fixes for those.
Those scripts also generated flow types for the libraries we’d just migrated, so that any other apps that hadn’t yet been migrated would still have types for imported packages.
Our monorepo contains a lot of shared libraries, many of which have transitive dependencies on other libraries in the repo. This is obviously a vast simplification. We didn’t want to end up migrating a library to TS only to find that it imported a load of Flow libs, which it would then not be able to read any types for. We had to make sure we went from the top down, migrating those that had no internal dependencies first, then unlocking the migration of the packages further down the tree.
One of our team wrote a script that would identify packages that were ready to migrate because all their internal dependencies had already been migrated to Typescript. We’d run that script now and then and ticket up the work that was ready to go.
Especially with an incremental migration such as this, it became really important to measure our progress not only to keep morale up but so that we could see how close we were - or not - to actually finishing.
I put together a dashboard that showed the number of Flow packages in the repo vs the number of Typescript packages. It was an exciting day when the lines finally crossed over!
I wrote a script that piggybacked off our “which packages are ready to migrate” script, which gave me a percentage of files that had been migrated. I posted this every month.
Towards the end I posted it every time the percentage changed.
Along with that measurement you need milestones - checkpoints along the way to indicate significant achievements in the migration.
Setting milestones not only made us feel like we were getting somewhere, but also helped us communicate our progress to stakeholders in easily understandable terms.
The first app we migrated was a milestone. Migrating our Business Banking web app was another. Migrating BizOps - the biggest app in our repo - was the final one.
By the way, if you’re not setting milestones like these for projects, you should definitely try it. It helps to break down a massive project in much more easily digestible chunks.
Another bend in the road showed up around mid-2023.
By this point we had the whole web discipline working on the migration, but we were still quite a way from the end. We’d hoped to get it done by the end of 2023 but it wasn’t looking likely.
We try to foster an environment where engineers are empowered to turn up to planning and say “By the way, I’m also doing this piece of discipline/maintance work this week” — and plan their squad work to factor that in. Often that’s easier said than done if you’ve got an important external deadline coming up, or perhaps you’re not used to having that kind of say over the work you’re doing that week. Either way, things were moving slowly and there was a risk that we just wouldn’t finish it at all.
We’d been roughly aiming for the end of 2023, but with the support of my engineering director I turned this into a hard deadline. We had to get it finished by the end of the year so that we could be done with it and move on.
We ticketed up the remaining work, breaking down our biggest app into tickets which were then assigned to teams based on code ownership and team size. We then got buy-in from the engineering managers and tech leads of our teams with web engineers, to make sure they brought Typescript tickets into planning.
Every month I counted up the tickets remaining for each team, and posted an update with our progress, celebrating the teams who migrated the most that month and giving a percentage.
We reduced the scope of the migration so that we wouldn’t have to migrate every single app to consider it done: only the ones that were worked on most. This means there would be a few apps left in Flow at the end, but they rarely get touched and people could migrate them as and when they needed to.
As we were struggling to get things done, one idea was to strip out all the remaining Flow and leave it with no type system, and then just migrate things as and when we needed to. We’d be able to remove babel if the code was just JavaScript. But ultimately we realised it would’ve taken just as long to do that as it would to just migrate it. Either way you’d have to run a migration script and manually fix errors. So we pressed on.
We didn’t have to get particular buy-in for quite a long time, because at Monzo there’s a certain expectation that there’s maintenance and engineering discipline work that happens alongside squad work. For a long while it was a small group of people doing bits here and there.
Getting buy-in became really important later in the project once we’d proven the potential value, it was obvious that things were progressing slowly, and we knew we had to get things done. Without the backing of senior stakeholders, there was a risk that the work wouldn’t be deemed important enough to actually finish.
My engineering director was a great support - with his backing, other less technical leaders in the organisation were much more likely to let us get on with it.
We brought managers and leadership on board by clearly articulating the problems we were having now, and benefits we were expecting at the end. I made it really clear that we wouldn’t see the majority of these benefits (such as compile time improvements) until we actually finished.
It helps if you know what stakeholders care most about. Some of them will be focussed on risk — what’s the danger if you don’t do the migration. Others may be more focussed on cost, or developer velocity. Is being on an old technology making you less able to ship things? Is this holding you back from scaling? Know who you’re pitching to, and tailor it.
Stakeholders will want to know when you’ll see the benefits from all of this hard work.
With some migrations, it’s 0 to 1. You don’t see the value until it’s completely done.
For a lot of the Typescript stuff, that was the case. We could migrate 99% of packages, but for as long as there was any Flow code still in the app, we would be stuck using the old Babel compiler.
But, there was a small amount of incremental value in being able to work in Typescript. No JS typing system is without its infuriatingly obtuse type errors, so it's not as if we magically had fewer of those, but the documentation was a lot better and we were able to upgrade some of our libraries like our Apollo GraphQL client a lot sooner than we anticipated.
In my opinion, the greatest threat to any migration is not its complexity - it’s changing priorities that leave you unable to finish.
Make sure you – and the powers that be – understand the risks of not completing the migration. It’s easy to think that you can just pause it and carry on with your lives, but in some cases you’ll end up in a worse position than you were before you started.
I’m contractually obliged to include this xkcd comic about standards here. You may introduce one new technology to replace them all, but if you don’t finish it’ll be one more technology on the pile.
Perhaps you’ll do a bit, and then need to focus on something else, and then someone else will come along and add another new technology. Maybe you have a lot of different teams working on different parts of a website, and you’re all using slightly different approaches and different libraries.
It’s also common after mergers and takeovers, where the other companies’ systems and codebases get subsumed into the parent company’s and you end up with a lot of different tech stacks.
When there are 5 different ways of doing the same thing in your codebase, how does an engineer know which one is the “right” one?
What is their motivation to update the old and replace it with new, if there’s still lots of the old stuff
If someone is paged for an incident and this particular system is written in Clojure when the rest of your systems are Ruby, what are they going to do?
Think about the end user - by adding a new library, are you increasing the bundle size your user has to download
You’re also much more likely to get buy-in if you prove you’ve given a lot of thought to the safety of the migration. Minimising disruption to operations, not inconveniencing people too much, and having an escape plan.
Any migration comes with risk. The important thing is how you deal with that risk, and prepare yourself for what can go wrong.
Type system migrations are fairly low-risk as they go. Especially as we were moving incrementally. That’s not to say it went perfectly smoothly, of course. But from the next section I’ve drawn some inspiration from some other recent migrations that have carried with them a higher risk.
At Monzo one of our values is be hard on problems, not people”.
Incidents are an inevitable consequence of doing things and moving forward. And shipping code.
Accept that you’ll probably have some incidents when migrating things — that’s fine — the important thing is that you can recover from them quickly and safely, and learn from them.
We almost certainly had some bugs that arose from migrating some of our components to Typescript — we were able to quickly roll them back, fix them, and move forward again.
It may be appropriate to establish a set of guardrails - points at which you will no longer be happy, and make the decision to roll back.
For example, if you’re migrating your website to a new platform, you may wish to monitor performance and load times and say “if load times consistently exceed x seconds, we will roll back.”
You should make sure you have the same metrics across the old and new systems, so you can accurately compare. Maybe some Grafana dashboards or something that can give you an idea of how things are looking for both implementations.
We recently migrated our customer support call tooling to a more modern platform. We built side-by-side dashboards with exactly the same metrics, one for the new system and one for the old system, measuring things like answer rate, queue length, the percentage of agents on a call. If the results are dramatically different across both, we know something’s wrong. We also worked with the operations folks in the calls team to figure out what percentages of those metrics would be unacceptable and mean that we needed to turn the new system off and roll back.
If you do hit one of those guardrails and you make the decision to roll back, you need to make sure you’ve got a well-documented, step-by-step plan to do so. Anyone should be able to follow this plan, not just you - especially if they’re paged in the middle of the night.
With the Typescript migration, we didn’t have a rollback plan. There was no going back for us. We didn’t foresee a situation where things were so bad that we had to roll back. We’d validated early on that there was negligible impact on build times when Typescript was added into the mix, and it all just compiles down to Javascript in the end anyway.
Other migrations have a lot more at stake. My colleague Suhail spoke at StaffPlus last year about some of the migrations that our platform teams have done. These migrations touch the thousands of microservices and database tables that literally power the bank. When critical functionality is in the hot path for your migration, you need to make sure there’s a clearly defined escape plan if things go wrong. This could be a full rollback, or a partial rollback to a previous state that you know is safe.
There will come a point where it’s no longer safe or possible to roll back. Ideally this should be closer to the end of your migration. Generally risks tend to show themselves earlier in the process anyway.
Make sure these points of no return are also well-documented, as it can actually be quite destructive if someone tries to roll something back when you’ve already passed this point.
And when you do roll things out, make sure you’re doing it as safely as possible.
If you can, test migration procedures on less important things like services or apps that nobody really uses.
You could run things in “shadow mode” where the new system runs in production alongside the old system. The old system handles all of the traffic as normal. The new system may handle all of the traffic too, or at least some of it — it does exactly what it’s meant to in production, but as a dry run. It doesn’t actually affect anything that’s happening in the live system. It might process and log out some data which you can then verify to make sure it’s working as expected. Think of it like your request going down two near-identical forked paths, but you only use the result from one of them- the old one. The other one is just for learning and verifying.
We ran a pilot rollout for the support calls rebuild with a small group of our customer service agents. We couldn’t just do a phased rollout across the teams, as we needed to be able to control variables such as how many agents from each team we had online, and make sure they were well-versed in how to use the new system.
We feature-flagged on the new system for them, and redirected a proportional percentage of inbound call traffic to the new system. This small controlled pilot allowed us to quickly pick up any bugs, get fast feedback from users, and switch it off very quickly if any of our guardrails were hit or we weren’t happy with the performance of the new system.
Like growing out a fringe, there’s probably going to be a really awkward in-between phase. You’ll have the old and new technologies alongside each other, you’ll have to maintain and make changes to both.
You may have to accept some temporary slowdowns in processes, such as additional build steps. We had to run both Flow and Typescript compilers in the build process throughout the migration, which did add a few seconds onto every build.
Keep on trucking, and eventually you’ll get to the point where you can remove the old one.
And we did it. In December last year I was running the migration stats script more frequently to see the number ticking up - 97, 98, 99. I PR’d a change towards Christmas that was just the remaining Flow files in the main app.
When I came back from my break in January I removed babel from our biggest app to see what would happen, expecting to find that we’d forgotten something, or some setting was misconfigured. But no, it just worked.
- Switching over to Next’s SWC compiler shaved off a bit of time from local dev and from CI as well.
- We started feeling the benefits sooner than I anticipated with regards to upgrading packages. After our core GraphQL packages were migrated, I was able to upgrade Apollo Client which we’d been holding off on because there weren’t any Flow types for version 3.
- Concerningly, we found a lot of type errors while migrating which were caused by actual missing params and props. Flow had somehow completely missed it, but Typescript didn’t.
- The flow server used to slow down our IDEs massively, and of course the crashing on top of that - without that, and using something like VSCode which is built for typescript, it’s a dream. Intellisense actually works now. It’s just nicer to use.
When you do finish, make sure you celebrate it. It’s not an easy task and it can take a really long time, as it certainly did in our case — recognise everyone who contributed and saw it through with you.
Post messages in public forums — this one was in our company-wide engineering announcements channel. Make sure engineering leadership know what you’ve all achieved.
We got stickers made, everyone loves a sticker to show they took part in something. We also had a party, and I ordered little cakes with Typescript toppers.
So that’s our story. How about yours?
Next time a migration opportunity presents itself, here are some things to think about before you take the leap.
Do you even need to do this migration at all?
As time goes on, will you be in the *same* place, or a *worse* place?
The decision to introduce a new technology needs to be well-considered. You need to have a really clear idea of the benefits that you’re expecting to get from it. Because all benefits are relative to the effort required to actually do the migration.
How much do you value that benefit in relation to the number of developer hours required to actually get there?
Is there a point at which the effort outweighs the value?
This gets more of an issue the larger your codebase gets and the more complex your organisational needs are.
For Typescript, compilation time was a nice bonus, but really we knew we couldn’t afford to be stuck on an obsolete type system that fewer and fewer people knew how to use. It did take us a long time to get there, but it’s going to continue to be worth it every time we are able to upgrade an important package because there are types for it, or when we hire a new web developer who has some experience with Typescript already.
Technology is not free once you’ve implemented it. There’s maintenance to think of.
Who’s going to look after this new technology? This includes staying on top of vulnerabilities, being on-call if that’s relevant, major version upgrades, debugging when things go wrong.
Make sure you don’t just hand it over to an infra or platform team and expect them to magically know how to look after it.
Does more than one person know how this technology works?
If you left the company would anyone be able to pick up where you left off?
Are the other engineers in the company ready to change their ways of working?
Do they reasonably have time to learn how to use the new thing?
E.g. GraphQL requires a paradigm shift in the way you build APIs. If you keep building APIs that expose everything, you don’t really save any computation.
Will you be able to hire new people to use this technology or is it too obscure?
If you started afresh today, and chose this new technology to build your codebase in, you know that you’d be going through the exact same thing a few years down the line. There are always going to be new technologies, in the same way that all code is legacy code as soon as it’s merged into main.
That said, you don’t have to do a migration every time something new comes out.
There’s a vast difference between picking a new tool because it seems cool, and picking a new tool because it’s genuinely going to be better.
Many of us have built workarounds and complex functionality that have been superseded by newer libraries and tools. We’ve got a microfrontends setup that would have massively benefited from Webpack 5’s module federation if it had been around when we built the infrastructure. Here, the benefits speak for themselves.
Things become end-of-life, as well. We might put off major version upgrades of things because we’re intimidated by the amount of work required, but it becomes a necessity at the point where the library won’t be updated any more and you’re at risk of software vulnerabilities not being patched.
Running old versions of things can also leave you at risk of incidents.
Things that seemed like a good idea at the time may end up being a poor decision.
Have you ended up with the technological equivalent of HD-DVD in your codebase?
It was the competitor to Blu-Ray. The early 21st century Betamax. Blu-Ray won the war of the standards and the HD-DVD disappeared, never to be seen again.
Many of us will have bet on the wrong horse when it comes to technology decisions. For us, Flow was one of those decisions.
We made a choice in the past, and it turned out to not have been the right one. We certainly weren’t the only ones, and at the time it made a lot of sense. These things happen.
Sometimes a library or technology will be deprecated in favour of another, and as we found, that can leave you at a disadvantage when it comes to upgrading things or taking advantage of advances in technologies.
As well as a load of good reasons, there are plenty of ill-advised reasons to do a migration.
Sometimes companies migrate to new technologies because it seems like that’s what the industry is doing now.
Don’t assume that just because a well-respected tech company is using a particular technology, you should too.
When everyone was adopting React, I wonder how many of them stopped to think about whether they actually needed something that was built for a Facebook-scale application. We use it internally at Monzo for our customer service application, but that’s massive and complex and has loads of people contributing. I think it’s a pretty good use case. But for a static page, how much of it is just excess Javascript you’re sending down the wire? Can you achieve the same effects with a static site generator?
Maybe you joined a new company or came onto a new project where they’re using a technology or library you aren’t familiar with, or don’t like.
I was given some good advice by my boss at a new company once when I said I didn’t like a library they had chosen: “think about whether it’s actually worse, or just different”. That’s really good advice. Is it just different from what you’re used it?
In my case I do think that it was worse and I was right, but the principle is a good one. Is it actually worse or are you just not used to it? By not opening your mind to different technologies from the one you’ve used, are you holding yourself - and your work - back?
Even if the new technology actually is better, do you have a good plan for how you’re going to get there?
Perhaps there’s a new language you’ve been playing with in your spare time and you wish you could use it all the time.
Or maybe there’s massive hype around a new technology, and it seems like it’d be good for some reason or another.
There are so many new libraries and frameworks and languages out there, and it’s exhausting trying to keep up. Many of those new technologies will come and go before you’ve even tried them out.
It’s really important to consider whether a new technology is actually better than what you’ve got now.
There’s nothing wrong with boring! Boring works. Boring is tried and tested. Boring is easier to maintain.
This is from DanMcKinley’s essay “Choose Boring Technology”, which he turned into a great talk you can read through at boringtechnology.club.
“Boring” should not be conflated with “bad.” There is technology out there that is both boring and bad. You should not use any of that. But there are many choices of technology that are boring and good, or at least good enough. MySQL is boring. Postgres is boring. PHP is boring. Python is boring. Memcached is boring. Cron is boring.
These technologies are familiar and reliable. People know how to use them. There’s lots of documentation. They’re well-maintained. Sure, they aren’t new and exciting, but they will work and they’ll do what you need them to do.
I hope these lessons will give you a framework to tackle your next migration, whether you’re at a big company or a small agency.
And remember: if you do a migration, do it wisely. And maybe you don’t need to do it at all.
Made with Keynote Extractor.