5

Ask HN: Inherited the worst code and tech team I have ever seen. How to fix it?

 2 years ago
source link: https://news.ycombinator.com/item?id=32883596
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Ask HN: Inherited the worst code and tech team I have ever seen. How to fix it?

Ask HN: Inherited the worst code and tech team I have ever seen. How to fix it?
237 points by whattodochange 9 hours ago | hide | past | favorite | 340 comments
I have to find a strategy to fix this development team without managing them directly. Here is an overview:

- this code generates more than 20 million dollars a year of revenue

- it runs on PHP

- it has been developed for 12 years directly on production with no source control ( hello index-new_2021-test-john_v2.php )

- it doesn't use composer or any dependency management. It's all require_once.

- it doesn't use any framework

- the routing is managed exclusively as rewrites in NGInX ( the NGInX config is around 10,000 lines )

- no code has ever been deleted. Things are just added . I gather the reason for that is because it was developed on production directly and deleting things is too risky.

- the database structure is the same mess, no migrations, etc... When adding a column, because of the volume of data, they add a new table with a join.

- JS and CSS is the same. Multiple versions of jQuery fighting each other depending on which page you are or even on the same page.

- no MVC pattern of course, or whatever pattern. No templating library. It's PHP 2003 style.

- In many places I see controllers like files making curl requests to its own rest API (via domain name, not localhost) doing oauth authorizations, etc... Just to get the menu items or list of products...

- no caching ( but there is memcached but only used for sessions ...)

- team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.

- productivity is abysmal which is understandable. The mess is just too huge to be able to build anything.

This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight.

I know a full rewrite is necessary, but how to balance it?

First off, no, a full rewrite is not only not necessary, but probably the worst possible approach. Do a piece at a time. You will eventually have re-written all the code, but do not ever fall into the trap of a "full re-write". It doesn't work.

But before you re-write once line of code - get some testing in place. Or, a lot of testing. If you have end-to-end tests that run through every feature that is currently used by your customer base, then you have a baseline to safely make changes. You can delete code as long as the tests pass. You can change code as long as the tests pass.

Once you are at that point, start picking off pieces to modernize and improve.

Also, respect the team. Maybe they aren't doing what you would, but they are keeping this beast alive, and probably have invaluable knowledge of how to do so. Don't come in pushing for change... come in embracing that this beast of a codebase makes 20 million a year. So talk about how the team can improve it, and modernize their skills at the same time.

Because if you walk in, saying, "This all sucks, and so do you, lets throw it out", do you really have to wonder why you are hitting resistance?

s.gif
My first instinct was "get some testing in place" too. That served me well in recent projects where I was in a similar situation. I was wondering if anyone has any advice on how to make sure your tests are... comprehensive? I was fortunate enough to have full flow tests in place from the beginning and a great team which knew the intricacies of the subject matter. We made lists of usecases and then tried to find orthogonal test cases. But that was my naive approach wondering if there are better methods out there. Especially if there is zero testing.
s.gif
I fully agree with this, but I think it misses a key step:

As the team’s manager, it’s your job to get buy-in from the executives to gradually fix the mess. You don’t need to tell the team exactly how to fix it, but you gotta get buy-in for space to fix it.

One approach is just to say “every Friday goes to adding tests!” (And then when there’s some reasonable test coverage, make fridays go to refactoring that are easy with the new tests, and so on).

But this often fails because when Friday comes, something is on fire and management asks to please quickly squeeze this one thing in first.

The only other approach I know of is to get buy in for shipping every change slightly slower, and making the code touched by that change better. Eg they want to add feature X, ok add a test for adjacent existing functionality Y, then maybe make Y a little better, just so adding X will be easier, then build X, also with tests. Enthusiastically celebrate that not only X got shipped but Y also got made better.

If the team is change averse, it’s because they’re risk averse. Likely with good reason, ask for anecdotes to figure out where it comes from. They need to see that risk can be reduced and that execs can be reasonable.

You need the buy-in, both from the execs and the team. Things will go slightly slower in the beginning and it’s worth it. Only you can make sell this. The metaphor of “paying off technical debt” is useful here since interest is sky high and you want to bring it under control.

s.gif
Before anything else, getting buy-in for any kind of major change from the execs is key. Explain the situation and the effects. Have everything in writing, complete with date and signatures. Push back hard every time this commitment gets sabotaged because something is supposedly on fire. Get a guaranteed budget for external trainings and workshops, again in writing. Then talk to the team.

If you cannot get those commitments in writing, or later on get ignored multiple times: run. Your energy and sanity is better spent elsewhere. No need to fight an uphill battle alone – and for what? The company just revealed itself for what it is and you have no future there.

First I’d do that, then think about the engineering part.

s.gif
Yeah, there's a process. It's something that I've done a bunch of times for a bunch of clients.

There's so much low-hanging fruit there that's so easy to fix _right now_. No version control? Good news! `git init` is free! PHPCS/PHP-CS-fixer can normalise a lot, and is generally pretty safe (especially when you have git now). Yeah, it's overwhelming, but OP said that the software is already making millions - you don't wanna fuck with that.

I've done it, I've written about it, I've given conference talks about it. The real bonus for OP is that the team is small, so there's only a few people to fight over it. It's pretty easy to show how things will be better, but remember that the team are going to resist deleting code not because that they're unaware that it's bad, but because they are afraid to jeporadise whatever stability that they've found.

s.gif
And also starting by fixing the js/css/html front end is likely the safest, as it wont corrupt any customer data & it will be visible when something breaks. That can probably be the next best candidate to do a major overhaul. I'd also hope that a $20M/year project can afford to hire someone senior in addition to these 3 juniors?
s.gif
> It doesn't work.

That's simply not true. I've inherited something just as bad as this. We did a full rewrite and it was quite successful and the company went on to triple the revenue.

> get some testing in place

Writing tests for something that is already not functional, will be a waste of time. How do you fix the things that the test prove are broken? It is better to spend the time figuring out what all the features are, document them and then rewrite, with tests.

s.gif
The problem with people new to the company starting a rewrite from scratch is that they often are poorly informed on why things were the way they were before. If you start big, you can have bad outcomes where the new system might be objectively worse than the old one... but you are stuck trying to get the new thing out for the next 5 years because too many people sunk too much political capital into it.

As an example, I worked at an ad-tech startup that swapped it's tech team out when it had ~100 million in revenue (via acqui-hire shenanigans). The new tech team immediately committed to rewriting the code base into ruby micro-services and were struck by strange old tech decisions like "why does our tracking pixel return a purple image?". The team went so far as to stop anyone from committing to the main service for several years in a vain attempt to speed up the rewrite/architecture migration.

These refactors inevitably failed to produce a meaningful impact to revenue, as a matter of fact the company's revenue had begun to decline. The company eventually did another house cleaning on the tech team and had some minor future successes - but this whole adventure effectively cost their entire Series D round along with 3 years of product development.

s.gif
You're making a silent assumption that the original team is well informed about why the things are like they are and that they know what they are doing. I think it is not always the case.

I've been to a project once where the mess in the original system was the result of the original team not knowing what they were doing and just doing permutation based programming - applying random changes until it kinda worked. The situation was very similar to that described by the OP. They even chose J2EE just because the CTO heard other companies were using it, despite not having a single engineer knowing J2EE. Overall after a year of development the original system barely even worked (it required manual intervention a few times per day to keep running!), and even an imperfect rewrite done by a student was already better after 2 weeks of coding.

So I believe the level you're starting the rewrite from is quite an important factor.

Then of course there is a whole world of a difference between "They don't know what they are doing" vs "I don't like their tech stack and want to master <insert a shiny new toy here>". There former can be recognized objectively by:

- very high amount of broken functionality

- abysmal pace at which new features are added

s.gif
We (my good friend and I who both have 20+ years of experience) were brought in specifically do to the rewrite. We were new to the company. We actually had to rebuild the entire IT department while we were at it as well.

> new tech team immediately committed to rewriting the code base into ruby micro-services

well... sigh.

> These refactors inevitably failed to produce a meaningful impact to revenue

It sounds like less about the refactor itself and more about the skills of the team doing the refactor. You certainly can't expect a refactor to go well if the team makes poor decisions to begin with.

s.gif
You have a great experience and did a great job indeed. My only question is how does one get 20 years of such experience without horrific flashbacks of “let’s just rewrite it” decisions. Do you do rewrites/redesigns often? What’s your success rate?
s.gif
> We were brought in specifically do to the rewrite.

That's the key difference. The stakeholders should always be in on the rewrite.

s.gif
> why does our tracking pixel return a purple image?

Now I'm really curious, is there some exciting non-obvious reason for a tracking pixel to be purple? Was it #FF00FF or more like #6600DD?

s.gif
This definitely needs an answer.

In fact, until OP can give us the right answer, we immediately need even wrong answers!

You reading this. Yes, you. Give your best wrong answer below.

s.gif
The problem is that most developers are crap and self centered on working with the tech they like.

You need to work with someone who doesn't care about filling up their CV with "ruby microservices" and get stuff done.

If I went into a business to do a rewrite and decided to use $shinyNewTech because I want to build up rust experience I'd probably end up wasting years with little results.

s.gif
You don't need comprehensive tests for tests to start delivering value.

Figure out the single most important flow in the application - user registration and checkout in an e-commerce app, for example.

Write an automated end-to-end test for that. You could go with full browser automation using something like Playwright, or you could use code that exercises HTTP endpoints without browser automation. Either is fine.

Get those running in GitHub Actions (after setting up the git scraping trick I described here: https://news.ycombinator.com/item?id=32884305 )

The value provided here immense. You now have an early warning system for if someone breaks the flow that makes the money!

You also now have the beginnings of a larger test suite. Adding tests to an existing test suite is massively easier then starting a new test suite from scratch.

s.gif
You're assuming the existing flow is working perfectly and I agree with you that testing is a godsend. I constantly yell that testing is great. Heck, I even worked for Pivotal Labs that does TDD and pair development, and loved it.

Let's say you start to write tests and start to see issues crop up. Now what? How do you fix those things?

Github actions!? They don't even have source control to begin with. There are so many steps necessary to just get to that point, why bother?

If the existing code base already has extremely slow movement and people are unwilling to touch anything for fear of breaking it... you're never going to get past that. Let's say you do even fix that one thing... how do you know it isn't breaking something else?

It is a rats nest of compounding issues and all you are doing is putting a bandaid on a gushing open wound. Time to bring in a couple talented developers and start over. Define the MVP that does what they've learned their customers actually need from their 'v1' and go from there. Focus on adding features (with tests) instead of trying to repair a car that doesn't pass the smog test.

s.gif
> Let's say you start to write tests and start to see issues crop up. Now what? How do you fix those things?

I assumed the tests wouldn't be for correctness, but for compatibility. If issues crop up, you reproduce the issues exactly in the rewrite until you can prove no one depends on them (Chesterton's fence and all).

The backwards-compatibility-at-all-costs approach makes sense if the product has downstream integrations that depend on the current interface. If your product is self-contained, then you're free to take the clean slate approach.

s.gif
Source control seems like a straightforward first step, regardless of what approach is going to be taken going forward
s.gif
One would think, but how do you go from source control to deployment on the production server though? If they were editing files on the server directly, there could be a whole mess of symlinks and whatever else on there. Even worse, how do you even test things to see if you break anything?

It is a can of worms.

s.gif
Doesn't Git support symlinks? Empty directories could be trouble though. One would have to put a .GITKEEP into every directory before checkin, and a step at deployment time to remove them again.
s.gif
Yeah, lot of worms...and if while refactoring things break. You are on the hook for scanning through that complex monster at 3 am and finding the issue and fixing it for no additional pay in most cases.
s.gif
The biggest problem isn't even the codebase in this situation.

When you keep finding bugs like that while refactoring and making things better, it will demoralise you. The productivity will stop when that happens.

It also require above average engineers to fix the mess and own it for which there is not much benefit.

Your refactoring broke things? Now it's your turn to fix it and also ship your deliverables which you were originally hired for. Get paged for things that weren't your problem.

If I was a manager and assigned this kind of refactoring work, I will attach a significant bonus otherwise I know my engineers will start thinking of switching to other places unless we pay big tech salaries.

People keep quoting Joel's post about why refactoring is better than rewrite but if your refactor is essentially a rewrite and your team is small or inexperienced - it's not clear which is better.

Parallel construction and slowly replacing things is a lot of unpaid work. Just the sheer complexity of doing it bit by bit for each piece is untenable for a 3 person team where most likely other two might not want to get into it.

s.gif
How do 2 junior devs manage to rewrite the entire product while also meeting the ongoing goals of the business?

You're trying to spec features on a moving target.

Even if they were able to do 50% time on the rewrite you'll never actually get to feature parity.

The only viable plan, unless the company has an appetite to triple the dev headcount, is to set an expectation that features will have an increased dev time, then as you spec new features you also spec out how far you will go into the codebase refactoring what the new features touch.

s.gif
But it is functional. Grandparent post is suggesting that all the currently used functionality should have tests written for it. It makes sense, as that way they can gather the requirements of a rewrite at the same time.
s.gif
We don't know that it is functional... maybe the company is only making $20m and should be making $60m. Like I said, we tripled the revenue with a rewrite.

What we did was make the case that we could increase revenue by being able to add valuable features more easily/quickly. We started with a super MVP rewrite that kept the basic valuable features, launched, then spent the rest of our time adding features (with tests). Hugely successful.

The key, of course, will be to get 1-2 top notch developers in place to set things up correctly from the beginning. You're never going to be effective with a few jr's who don't have that level of experience.

s.gif
> We don't know that it is functional... maybe the company is only making $20m and should be making $60m. Like I said, we tripled the revenue with a rewrite.

It's $20m functional. It's possible it could be better but unless this is the kind of huge org where 20m is nothing (doesn't sound like it) you really need the behaviors documented before you start screwing with it. It's very likely this thing has some pretty complex business logic that is absolutely critical to maintain.

s.gif
> you really need the behaviors documented before you start screwing with it. It's very likely this thing has some pretty complex business logic that is absolutely critical to maintain.

Nothing I said suggested otherwise. Absolutely critical for whomever is doing a rewrite to understand everything they can about the application and the business, before writing a single line of code.

s.gif
You sound frustrated that you've joined a company with an absolute stinker of a codebase, because you're confident you could deliver much better results having refactored it first. You're managing a group of people probably enormously under-productive because of the weight of the technical debt they're under. Every change takes months. It's riddled with hard-to-fix bugs. It's insecure. There are serious bus factor problems.

Many of us have been in this exact position before, multiple times. Many of us have seen somebody say "our only choice is a full rewrite" - some of us were the one making that decision. Many of us have seen that decision go disastrously wrong.

For me, the problem was my inability to do what I'm good at: write tests, write implementations that pass that test, etc. Every time I suggested doing something, somebody would have a reason why that would fail because of some unclear piece of the code. So rather than continuously getting blocked, I tried to step into my comfort zone of writing greenfield code. I built a working application that was a much nicer codebase, but it didn't match the original "spec" from customer expectations, so I spent months trying to adjust to that. I basically gave up managing the team because I was so busy writing the code. In the end, I left and the company threw away the rewritten code. They're still in business using the shitty old codebase, with the same development team working on it.

If you really want to do the rewrite, accept how massively risky and stressful it will be. The existing team will spend the whole team trying to prove you were wrong and they were right, so you need to get them to buy into that decision. You need to upskill them in order to apply the patterns you want. And you need to tease apart those bits of the codebase which are genuinely awful from those that for you are merely unfamiliar.

Personally, I would suggest a course for you like https://www.jbrains.ca/training/course/surviving-legacy-code, which gives you a wider range of patterns to apply to this problem.

s.gif
Maybe this was meant as a reply to the main post?
s.gif
> How do you fix the things that the test prove are broken?

Uhm. The tests don’t do any such things.

> It is better to spend the time figuring out what all the features are, document them

Yes. And the tests you should write are executable documentation showing how things are. It is like taking a plaster cast of a fossil. You don’t go “i think this is how a brachiosaurus fibula should look like” and then try to force the bones into that shape. You mould the plaster cast (your tests) to the shape of the fossil (the code running in production). Then if during excavation (the rewrite) something changes or get jostled you will know immediately that it happened, because the cast (the tests) no longer fit.

s.gif
> We did a full rewrite and it was quite successful and the company went on to triple the revenue.

Which sure beats some other company coming along and "rewriting" the same or similar functionality in a competing product and killing your own revenue. But it does come down to how big the codebase is and how long it would take for an MVP to be a realistic replacement. If there are parts that are complex but unlikely to need changing soon you can usually find ways to hide them behind some extra layer. Is there any reason you couldn't just introduce proper processes (source control, PRs, CI/CD etc.) around the existing code though?

s.gif
Of course a full rewrite can be successful. This is the problem when people base their entire critical thinking on blog posts. They then go on to preach it everywhere as well!
s.gif
Exactly. If they write tests, they will be just doing TDD where the specification becomes a problem in itself.
s.gif
This is the point: I don’t TDD, but i am a big fan of tests. I’m this case the incorrect spec can be flagged, but all the other incorrect specs will also be there. If your Fix doesn’t break a spec, great, but if it does you can check if that spec was correct. It’s a back and forth between code and business requirements
s.gif
It is a 12 year old legacy product. What specification exists other than, "Yesterday it did X when I clicked the button, but now it does not do that anymore."
s.gif
Yep.

It's also a juggling job from hell so keep a cool head and seek support and resources for what needs to be done.

A big first step is to duplicate and isolate, as much as possible, a "working copy" of the production working code.

You now need to maintain the production version, as requests go on, while also carving out feasible chunks to "replace" with better modules.

Obviously you work against the copy, test, test again, and then slide in a replacement to the live production monolith .. with bated breath and a "in case of fubar" plan in the wings.

If it's any consolation, and no, no it isn't, this scenario is suprisingly common in thriving businesses.

s.gif
This approach is a trap.

Management need to know that this needs a rewrite, and a more capable team, and that persuing on aggressive roadmap while things are this bad is impossible.

If they say no, and you try to muddle your way through it anyway, you are setting yourself up to fail.

If they say yes, ask for the extra resources necessary to incrementally rewrite. I would bring in new resources to do this with modern approaches and leave the existing team to support the shrinking legacy codebase.

s.gif
Why would the existing team stick around knowing their jobs would be slowly rewritten into oblivion by others?
s.gif
Where else are they going to go if they prefer this mess?

Why would they need to be replaced if they’re ultimately convinced to enter the 21st century?

s.gif
Yeah, I agree, full rewrite from scratch are almost never the good approach. It will start a tunnel when you cannot add anything useful to production for months, and you will have no idea when you can finally ship the whole thing and when you do, it will be very risky.

Do things progressively. Read the code, figure out the dependencies, find the leaves and starts with refactoring that. Do add tests before changing anything to make sure you known if you change some existing behaiors.

Figuring out such code base as a whole might be overwhelming, but remember that it probably looks much more complicated than it is actually.

s.gif
In a team with only two people working on the monster it seems reasonable that they’d be able to manage two development streams at the same time.
s.gif
Do it in small pieces and you'll be there forever - it'll never get done.

Map out the functionality related to the (hard) requirements and kick off replacing the product(s) with something modern and boring.

s.gif
Your suggestion sounds like the strangler fig pattern. While a valuable strategy in some cases, it does present the risk of duplicating poor architecture choices into the new code.

I would normally opt for your suggested approach too. However, based on the description given, I’d most likely recommend a complete rewrite in this case. The architecture appears to be quite poor and the risk of infecting new code with previous bad decision-making may be too great.

s.gif
Yes same. Sometimes you see a frankenstein code and devs get all emotional and wants a full rewrite or die attitude. Maybe take a step back and migrate piece by piece.
s.gif
Not a good advice. I have been at Op's shoes, and I inherited a project that was a clusterf, and did a full re-write. It was a lot of work (more than anticipated), but eventually it was very successful.

The original code was just not salvageable. (It was quickly done as a fast hack, and it would break left and right, causing outages).

Just make sure the OP needs to understand what the OG system is trying to do, and what it will take to re-write it to something sane. Don't start it, before understanding all the caveats of the system/project you are trying to re-write.

s.gif
> First off, no, a full rewrite is not only not necessary, but probably the worst possible approach. Do a piece at a time. You will eventually have re-written all the code, but do not ever fall into the trap of a "full re-write". It doesn't work.

I've seen systems where the entirety of the codebase is such a mess, but is so tightly coupled with the business domain, that a rewrite feels impossible in the first place. Furthermore, because these systems are often already working, as opposed to some hypothetical new rewrite, new features also get added on top of the old systems, meaning that even if you could rewrite them, by the time you would have done so, it would already be out of date and wouldn't do everything that the new thing would do (the alternative to which would be making any development 2x larger due to needing to implement things both in the old and new versions, the new one perhaps still not having all of the building blocks in place).

At the same time, these legacy systems are often a pain to maintain, have scalability and stability challenges and absolutely should not be viewed as a "live" codebase that can have new features added on top of it, because at that point you're essentially digging your own grave deeper and deeper, waiting for the complexity to come crumbling down. I say that as someone who has been pulled into such projects, to help and fix production environments after new functionality crippled the entire system, and nobody else knew what to do.

I'd say there is no winning here. A full rewrite is often impossible, a gradual migration oftentimes is too complex and not viable, whereas building on top of the legacy codebase is asking for trouble.

> But before you re-write once line of code - get some testing in place. Or, a lot of testing. If you have end-to-end tests that run through every feature that is currently used by your customer base, then you have a baseline to safely make changes. You can delete code as long as the tests pass. You can change code as long as the tests pass.

This is an excellent point, though! Testing is definitely what you should begin with when inheriting a legacy codebase, regardless of whether you want to rewrite it or not. It should help you catch new changes breaking old functionality and be more confident in your own code's impact on the project as a whole.

But once again, oftentimes you cannot really test a system.

What if you have a service that calls 10 other services, which interact with the database or other external integrations, with tight coupling between all of the different parts? You might try mocking everything, but at that point you're spending more time making sure that the mocking framework works as expected, rather than testing your live code. Furthermore, eventually your mocked data structures will drift out of sync to what the application actually does.

Well, you might try going the full integration test approach, where you'd have an environment that would get tests run against it. But what if you cannot easily create such an environment? If there are no database migrations in place, your only option for a new environment will be cloning an existing one. Provided that there is a test environment to do it from (that is close enough to prod) or that you can sufficiently anonymize production data if you absolutely need to use it as the initial dump source, you might just run into issues with reproducibility regardless. What if you have multiple features that you need to work on and test simultaneously, some of which might alter the schema?

If you go for the integration testing approach, you might run into a situation where you'll need multiple environments, each of which will need their own tests, which might cause significant issues in regards to infrastructure expenses and/or software licensing costs/management, especially if it's not built on FOSS. Integration tests are still good, they are also reasonably easy to do in many of the modern projects (just launch a few containers for CI, migrate and seed the database, do your tests, tear everything down afterwards), but that's hard to do in legacy projects.

Not only that, but you might not even be fully aware how to write the tests for all of your old functionality - either you need to study the whole system in depth (which might not be conceivable), or you might miss out on certain bits that need to be tested and therefore have spotty test coverage, letting bugs slip through.

> Once you are at that point, start picking off pieces to modernize and improve.

It helps to be optimistic, but for a plethora of reasons, many won't get that far. Ideally this is what people should strive for and it should be doable, but in these older projects typically the companies maintaining them have other issues in regards to development practices and reluctance to introduce tools/approaches that might help them improve things, simply because they view that currently things are working "good enough", given that the system is still generating profits.

Essentially, be aware of the fact that attempts to improve the system might make things worse in the short term, before they'll get better in the long term, which might reflect negatively upon you, unless you have sufficient buy-in to do this. Furthermore, expect turnover to be a problem, unless there's a few developers who are comfortable maintaining the system as is (which might present a different set of challenges).

Ideally, start with documentation about how things should work, typical use cases, edge cases etc.

Then move on to tests, possibly focusing on unit tests at first and only working with integration tests when you have the proper CI/environment setup for this (vs having tests that randomly fail or are useless).

After that, consider breaking the system up into modules and routing certain requests to the new system. Many won't get this far and I wouldn't fault you for exploring work in environments that set you up for success, instead of ones where failure is a looming possibility.

s.gif
This is exactly the right advice. Full rewrite might look good on the resume but will be a late error prone disaster.

Start with tests can't emphasize this enough.

s.gif
The big rewrite works - but only if you have a team you can trust. You need a new team of seniors to pair with the current team, promise a promotion to the current team at the end of the task.

Committing to an iterative approach is what I do when I don't have enough authority/ political tokens and I can't afford a rewrite.

Over time it gets less and less priority from the business and you end up with half a codebase being crap and half codebase being ok and maintaining stuff is even harder.

s.gif
Huh. You are literally saying do a full rewrite. But it's also the worst idea?

Edit: A full rewrite always meant replacing every part of a system. Whether you do it gradually doesn't really matter.

s.gif
"Whether you do it gradually doesn't really matter."

It absolutely DOES matter. A gradual rewrite is much more likely to work than a stop-the-press rewrite.

s.gif
It's still a rewrite. The crux of the statement I made.
s.gif
There problem with a classic full rewrite is that the existing system is thrown away immediately. All the existing features are not available in production until the rewrite adds them back in. Often incomplete, buggy, changed beyond all recognition, or a combination of all of these. That obviously sucks and is the reason the classic rewrite is rarely done. However, it is clear that something must happen.
s.gif
"Full rewrite" is a description of the end state, not the process.

The best way to do a full rewrite is incrementally, with test support and consideration for natural separation of internal subsystems.

s.gif
He’s saying to Ship of Theseus the codebase. Don’t build a new ship and then burn down the old ship. Replace the old ship piece by piece in place.
s.gif
That only works if the new pieces correspond to old pieces. If there's no good structure to build on, the units to be replaced will constrain the architecture of the new ship.

At some point you end up trying to change a pumpkin boat into an aircraft carrier, and there's no obvious way you can do that one piece at a time.

s.gif
> If there's no good structure to build on, the units to be replaced will constrain the architecture of the new ship.

Which is why you do it in stages: add scaffolding until local rewrites are possible, then rewrite the business logic, then tear the scaffolding down.

s.gif
That's a good analogy actually. Scaffolding is a kind of temporary test structure that you can use to maintain function while you figure out something better.
s.gif
Maybe there are some underlying architectural problems that need to be addressed, but it would be impossible make those changes from the current situation. It sounds like it is impossible to even know what code is live vs sitting on the server. How do you even know you have a firm grasp on the current architecture when it is unclear what code is even running the product?

A lot of low hanging fruit to be addressed that will likely lead to meaningful improvements. Once the code is in better shape and some unfortunate legacy pattern is identified, than it can be considered time to re-tool the architecture.

s.gif
Agreed. The first thing to do is figure out WTF is going on. This is perhaps the hardest kind of thing to do as a developer.
s.gif
They’re saying to do it, eventually, incrementally and not all at once.
s.gif
Full rewrite generally means stop the presses we are gonna migrate this whole thing from here to there and no new features until it's done (hint it never gets done).
s.gif
I’ve only ever witnessed ship-of-Theseus style migrations and those also never get done.
s.gif
Does not compute... Ship of Theseus is just regular old development of course it never gets done but new features aren't put on hold.
s.gif
I mean like “we want to replace X with Y”. Y incrementally starts replacing X, but 100% migration is never achieved, meaning double the API surface area exists indefinitely.

Because the migration doesn’t block new features, that means the org gets tired and reallocates the effort elsewhere before it’s ever done, with no immediate consequences. Rinse and repeat.

s.gif
I think you've not witnessed Ship of Theseus, but "build Ship2 next to Ship1 and start using Ship2 while Ship1 is still being used and keep saying you're going to migrate to Ship2 eventually but meanwhile Ship1 and Ship2 diverge and now you have 2 ships".

I recently witnessed this mess and it is an enormous mess. Don't build Ship2 in the first place. Instead, replace Ship1's mast and sails, and rudder etc until you've replaced all the parts in Ship1. That's the SoT approach.

s.gif
A "full rewrite" means that after the completion of the rewrite, the old code has been fully replaced by new code.

What you're describing is a "stop-the-world" rewrite.

A lovely knot to unravel!

First, get everything in source control!

Next, make it possible to spin service up locally, pointing at production DB.

Then, get the db running locally.

Then get another server and get cd to that server, including creating the db, schema, and sample data.

Then add tests, run on pr, then code review, then auto deploy to new server.

This should stop the bleeding… no more index-new_2021-test-john_v2.php

Add tests and start deleting code.

Spin up a production server, load balance to it. When confident it works, blow away the old one and redeploy to it. Use the new server for blue/green deployments.

Write more tests for pages, clean up more code.

Pick a framework and use it for new pages, rewrite old pages only when major functionality changes. Don’t worry about multiple jquery versions on a page, lack of mvc, lack of framework, unless overhauling that page.

s.gif
This is the right way to think about it. My only disagreement is that I'd do the local DB before the local service. A bunch of local versions of the service pointing at the production DB sounds like a time bomb.

And it's definitely worth emphasizing that having no framework, MVC, or templating library is not a real problem. Those things are nice if you're familiar with them, but if the team is familiar with 2003 vintage PHP, you should meet them there. That's still a thing you can write a website in.

s.gif
> if the team is familiar with 2003 vintage PHP, you should meet them there. That's still a thing you can write a website in.

You can write a website in it, but you cannot test it for shit.

s.gif
Good strategy. I would suggest not hooking it up to prod DB at the start. Rather script out something to restore prod DB backups nightly to a staging env. That way you can hookup non prod instances to it and keep testing as the other engineers continue with what they do until you can do a flip over as suggested. Key here is always having a somewhat up to date DB that matches prod but isn't prod so you don't step on toes and have time to figure this out.

Note that going from no source control to first CD instance in prod is going to take time...so assume you need a roll out strat that won't block the other enigneers.

Considering what sounds like reluctance for change the switch to source control is also going to be hard. You might want to consider scripting something that takes the prod code and dumps it into SC automatically, until you have prod CD going...after that the engineers switch over to your garden variety commit based reviews and manual trigger prod deploy.

Good luck! It sounds like a interesting problem

s.gif
Agree with this approach. You have nginx in front of it already so you can replace one page at a time without replacing everything.

One thing I haven’t seen mentioned here is introducing SSO on top of the existing stack, if it’s not there. SSO gives you heaps of flexibility in terms of where and how new pages can be developed. If you can get the old system to speak the new SSO, that can make it much easier to start writing new pages.

Ultimately, a complete rewrite is a huge risk; you can spend a year or 2 or more on it, and have it fail on launch, or just never finish. Smaller changes are less exciting, but (a) you find out quickly if it isn’t going to work out, and (b) once it’s started, the whole team knows how to do it; success doesn’t require you to stick around for 5 years. An evolutionary change is harder to kick off, but much more likely to succeed, since all the risk is up front.

Good luck.

s.gif
I'd add putting in a static code analysis tool in there because that will give you a number for how bad it is (total number of issues at level 1 will do), and that number can be given to upper management, and then whilst doing all the above you can show that the number is going down.
s.gif
There is significant danger that management will use these metrics to micromanage your efforts. They will refuse changes that temporarily drive that number up, and force you to drive it down just to satisfy the tool.

For example, it is easy to see that low code coverage is a problem. The correct takeaway from that is to identify spots where coverage is weakest, rank them by business impact and actual risk (judged by code quality and expected or past changes) and add tests there. Iterate until satisfied.

The wrong approach would be to set something above 80% coverage as a strict goal, and force inconsequential and laborious test suites on to old code.

s.gif
> Next, make it possible to spin service up locally, pointing at production DB.

I think this is bad advice, just skip it.

I would make a fresh copy of the production DB, remove PII if/where necessary and then work from a local DB. Make sure your DB server version is the same as on prod, same env etc.

You never know what type of routines you trigger when testing out things - and you do not want to hit the prod DB with this.

Can you just... Walk away? Not because of the technical challenges, but because:

> team is 3 people, quite junior > eesistance to change is huge. > productivity is abysmal > aggressive roadmap > management and HQ had no real understanding > budget is tight

I have never walked away from a technical challenge, but I've exited from management clusterfucks and have never regretted it.

> this code generates more than 20 million dollars a year of revenue

From a business perspective, nothing is broken. In fact, they laid a golden goose.

> team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.

My mistake, they didn't lay a golden goose--they built a money printer. The ROI here is insane.

> productivity is abysmal which is understandable. The mess is just too huge to be able to build anything.

But you just told me they built a $20M revenue product with 3 bozos. That sounds unbelievably productive.

> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers

You should consider quitting your job.

As far as the business is concerned, there are no problems... because well... they have a money printer, and your team seems not to care enough to advocate for change. Business people don't give a damn about code quality. They give a damn about value. If 2003 style PHP code does that, so be it. Forget a rewrite, why waste time and effort doing simple refactoring? To them, even that has negative financial value.

From their perspective, you're not being paid to make code easy to work with, you're being paid to ship product in a rats nest. Maybe you could make a business case for why its valuable to use source control, dependency management, a framework, routing outside of nginx, and so on... but it doesn't sound like any of that mattered on the road to $20M a year, so it will be very difficult to convince them otherwise especially if your teammates resist.

This, again, is why you should consider leaving.

Some developers don't mind spaghetti, cowboy coding. You do. Don't subject yourself to a work environment and work style that's incompatible with you, especially when your teammates don't care either. I guarantee you will hate your job.

s.gif
This is why we as a software world need some minimum standards for stuff that deals with sensitive information of users.

Don't get me wrong, if it works it works, but the question is for how long and who will suffer when it doesn't?

Also from a business perspective: If I were the CEO of that company I'd probably like to know that there is something built on sand and a huge technological dept. It is a cashcow now, but I'd like to ensure it still can be one in the future. And for this some level of maintenance would be expected.

Same thing for reliabilty. If as a CEO I knew the entire functioning of the thing that brings the cash hinges on one developer remembering whether index_john-final.php or index_peter-final-final.php is the latest code I would probably have a hard time sleeping.

That means the minimum OP should do is explain the situation neutrally and your point of view is certainly something he should weave into this. In the end the higher ups need to know this, and what they decide to do is their thing, but OP needs to make them aware of why this could potentially endanger the service in the future. If they then decide to take that risk — so be it.

s.gif
Best answer. If the money is ok, and the environment not too toxic/stressful, you might just see it as a challenge to secretly improve a codebase without anybody noticing, while still delivering what the higher-ups want to see. Or maybe just scratch the first part and try to see how much further you can push that turd with every coding crime imaginable. One-up the juniors in ugly hack Olympics. Ship a feature and put it on your CV before leaving.

Otherwise, walk away immediately.

s.gif
This is very insightful. I learned it kind of the hard way. The business world is a mess. Requirements are a mess and always changing. This leads to messy code that requires a lot of time to clean up. You don't have time for that as long as there is always more customer wishes and projects coming in. As long as the business keeps working there's always something of top priority coming in. The pain starts growing but the steaming pile of code just doesn't collapse. It just kind of keeps on working while you are adding more and more code. Sure, the pain is big and progress is quite slow but what's the alternative?

My advice would be to listen to the developers, to understand them and the business. To understand what they really need. What a viable path forward would be. A complete rewrite, a second system for new developments, many more developers or something. Or maybe it is the optimum solution right now because the whole company is so messy and your job is not to change the company structure. Then you maybe you could support them by slowly enhancing their skill set and accept what you can't change. Doesn't sound like fun? Then leave soon, staying won't do you any good.

s.gif
It wasn't necessarily written by those 3 devs. They're just the current team. Granted, they probably have been that for a long time because of the resistance to change, but the brightest minds are probably long gone.
s.gif
I'd bet it's B2B and has an expensive sales division that top management believes (tbh maybe even rightly) is the real revenue driver.
s.gif
$20M revenue is not the same as $20M profit.
s.gif
It’s not the same, but if those 20M is primarily generated by the software, then it’s those 3 ppl, who contribute to the top line. The rest, like sales, marketing, are irrelevant: fire them and the product will keep generating revenue off the existing customer base. It will stop doing so, however, if the product brakes. So, the post above is right to an extent, this is the golden goose. ))
s.gif
Unless the revenue is for products ordered on the site and shipped to paying customers. Believe it or don't, this is still done at some sites that are not Amazon.
s.gif
> team is 3 people, quite junior.

Even at FAANG salaries this wouldn't be that much compared with $20M

s.gif
You don't know what the costs are though. The site could have huge costs of content acquisition or any number of reasons to not be making anywhere near $20 million profit.
s.gif
Revenue isn't the same as Earnings Before Salaries either.

E.g. maybe it's an e-wholesaler or widget reseller, bought $19M goods and sold $20M. Or maybe it was much slower than expected, they actually bought $25M goods and are burning 500k/month on warehousing. Or whatever.

s.gif
Yes, but the parent is saying this could be an e-commerce website, or construction company, etc. But having an iOS app feels unnecessary for most of these businesses
s.gif
The engineering team might be 3 people but not the whole company.
s.gif
>> team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.

> My mistake, they didn't lay a golden goose--they built a money printer. The ROI here is insane.

We can not make that conclusion. Presumably the business still needs salespeople, support, etc.

s.gif
That's what I was thinking as well. Run for you hills.

It's likely a losing up hill battle.

s.gif
I think the thing is that the codebase after 5 years of development was already capable of making 19M/year, and the past 7 extra years have just added 1M/year. The next year of development will not add any more because the thing is collapsing under it’s own weight.
s.gif
so it will just be 20 mil a year with 3 jr devs

sounds ok?

s.gif
Only in absolute terms. Kind a feels like a missed chance if you actually have a 100M market.
s.gif
Great answer. Developers are gunna hate it
s.gif
This is the sanest answer. No amount of leadership is going to help an incompetent team. A codebase with massive technical debt, tight coupling, and accidental complexity will be hard to improve incrementally. Impossible without competent engineers.
s.gif
I agree it's the sane answer. But I don't think these engineers are incompetent. They lacked direction, accidentally followed worst practices, and _still_ came out on top. I would say they are good engineers but perhaps bad project managers / architects.
s.gif
You don't not use source control because nobody directed you to and you 'accidentally' .. what, forgot about it?

You don't use it because you haven't heard of it; = not competent.

s.gif
I find it hard to imagine you’d never heard of source control by now. You’d have to have been living under a rock for the past 15 years.
1. Grab a copy of Working Effectively With Legacy Code

2. You say you don’t manage the team. I guess you have some kind of ‘tech lead’ role. I think to get things to change, you’re going to need buy in from management and the team. If the budget is tight it will be harder to say ‘we need to invest in fixing all this stuff instead of whatever it is that actually makes money’. Whatever you do must have a good business case. It sounds like there needs to be better communication about the state of things with whoever in the business unit came up with the aggressive roadmap.

Perhaps a roadmap like this would work:

- First, set up source control and separate prod from however people are developing things. Hopefully this will reduce trivial outages from people eg making a syntax error when editing prod. I think this will be a difficult fight with the team and management may not understand what you’re doing. You’ll likely need to be ready to be the person who answers everyone’s git questions and un-fucks their local repos. You’ll probably also want some metrics or something to show that you are reducing trivial errors.

- I think some intermediate stages might involve people still developing in prod but having source control there and committing changes; then developing locally with a short feedback loop from pushing to running on prod (you won’t get but-in if you make the development process slower/more inconvenient for the team); then you can hopefully add some trivial tests like php syntax checks, and then slowly build up a local dev environment that is separate from proof and more tests. At some point you could eg use branches and perhaps some kind of code-review process (you can’t be the only person responsible for code review, to be clear)

- You’re going to want a way to delete old code. I think partly you will be able to find unreachable code and delete it but also you’ll likely want a way to easily instrument a function to see if it is ever used in proof over eg a week or two.

- Eventually, improving the dev environment enough may have already led to some necessary refactors and you’ll have enough tests that the defect rate will have decreased. At some point you’ll hopefully be confident enough to make bigger reactors or deletions and wean people further off messing with prod. For example moving some routing, bit-by-bit outside of nginx or perhaps using some lightweight framework.

- you should also get the team involved in making some smaller refactors too and they should definitely be involved in adding tests.

First of all: PHP is fine. It really is.

Second: Doing a full rewrite with a junior team is not going to end well. They’ll just make other mistakes in the rewritten app, and then you’ll be back where your started.

You need to gradually introduce better engineering practices, while at the same time keeping the project up and running (i.e. meeting business needs). I’d start with introducing revision control (git), then some static testing (phpstan, eslint), then some CI to run the test automatically, then unit/integration tests (phpunit), etc. These things should be introduced one at a time and over a timespan of months probably.

I’d also have a sort of long term technical vision to strive against, like “we are going to move away from our home-written framework towards Laravel”, or “we are moving towards building the client with React Native”, or whatever you think is a good end outcome.

You also need to shield the team from upper management and let them just focus on the engineering stuff. This means you need to understand the business side, and advocate for your team and product in the rest of the organization.

You have a lot of work ahead of you. Be communicative and strive towards letting people and business grow. I can see you focus a lot on the technical aspects. Try to not let that consume too much of your attention, but try to shift towards business and people instead.

s.gif
the only reason people dunk on php is because of experiences like this

from a user perspective seeing a php file extension is an accurate predictor with seeing a disorganized mess of everything and a “LAMP stack” stuck in 2003 just as described here

from a developer perspective it’s correlated with everything described by OP

you’re correct it isn’t inherently php’s problem, it can do RESTful APIs and a coherent code design pattern no problem

Lots of people are giving advice on how to fix the code piecemeal. First put it on Git, then add tests, then, carefully and gradually, start fixing the issues. Depending on the project, this could take a year or several years, which isn't bad.

The problem with this plan is corporate politics. Say that OP takes on this challenge. He makes a plan and carefully and patiently executes it. Say that in six months he's already fixed 30% of the problem, and by doing do he meaningfully improved the team's productivity.

The executives are happy. The distaster was averted, and now they can ask for more features and get them more quickly, which they do.

Congratulations, OP. You are now the team lead of a mediocre software project. You want to continue fixing the code beyond the 30%? Management will be happy for you to take it as a personal project. After all, you probably don't have anything to do on the weekend anyway.

You could stand strong and refuse to improve the infrastructure until the company explicitly prioritizes it. But then why would that job be better than just taking a random position in a FAANG company? The code quality will be better and so will the pay.

s.gif
A lot of people see this as a technical challenge, but it is a political one. The road the business took to get to this point is crucial, and understanding whether the business want to fix it is key.

In my experience once businesses get into this sort of mess they will never work their way out of it. To use an analogy, there is a point where people who make terrible lifestyle choices (smoking, obesity etc) where the damage is too far gone.

A company I used to work for had a horrendous codebase that was making them a ton of revenue. It wasn't as bad as the OP's codebase, but it was pretty terrible. It was wedded to a framework that was abandoned 8 years ago. Everything was strongly coupled to everything else, meaning it was brittle. Every release they'd have to have an army of QA testers go through it and they'd find 100's of new bugs. Every bug that got fixed introduced another.

The lesson I learned? Find these things out during an interview. Ask about their CI, ask about QA and automated testing and really push them on details. If they give vague answers or something doesn't smell right, walk away.

s.gif
Yes, because it's famously easy to just go grab a FAANG job whenever you feel like it, wherever you are in the world.
s.gif
Not easy, but easier in my opinion than the multi-year project that OP is planning to undertake.
s.gif
> But then why would that job be better than just taking a random position in a FAANG company? The code quality will be better and so will the pay.

Exactly the thing people are missing here. It's a lot of work at a very high skill level with lot of political burning ground if refactoring break things or suddenly slow down at a mediocre shop.

s.gif
This is more reason for the gradual approach. Bringing source control, testing, and separation of dev and prod environments, makes things safer and will speed up work pretty soon by making the people much more comfortable to actually try things.
s.gif
I've done that progressively over two years as a junior and I'm basically unofficial tech lead now. Managers listen to me and I can plan and influence several projects. I have other goals so I won't become official team lead by choice, but that's probably valuable to OP if he can pull that off.
Feels like you should start with introducing version control, dependency management, and creating a deploy process after that.

Those seem like low hanging fruit that are unlikely to effect prod.

You should also probably spend a decent amount of time convincing management of the situation. If they're oblivious that's never going to go well.

I agree a full rewrite is a mistake and you have to instead fixed bite sized chunks. It also will help to do that if you start to invest in tooling, a deploy story and eventually tests (I'm assuming there are none). If I was making 20 million off some code I'd sure as heck prioritize testing stuff (at least laying the groundwork).

Its probably also worth determining how risk tolerant the product is and you could probably move faster cleaning up if it is something that can accept risk. If it's super critical and I'd seriously prioritize setting up regression testing in some form first

s.gif
> Feels like you should start with introducing version control, dependency management, and creating a deploy process after that.

Agree with this 100%. This sounds like a team that may not even know how to develop locally. Showing them that they can run the system on their own computer and make experimental changes without risking destroying the company would be a huge game changer. If they’re not open to that idea it may really be hopeless.

Plenty of good suggestions in here about e.g. tests, source control etc. You will need them all.

But I would start by choosing how and whether to fix up the crown jewels, the database.

You say that instead of adding columns, team has been adding new tables instead. With such behaviours, it's possible your database is such a steaming pile of crap that you'll be unable to move at any pace at all until you fix the database. Certainly if management want e.g. reporting tools added, you'd be much better to fix the database first. On the other hand, if the new functionality doesn't require significant database interaction (maybe you're just tarting up the front end and adding some eye candy) then maybe you can leave it be. Unlikely I would imagine.

Do not however just leave the database as a steaming pile of crap, and at the same time start writing a whole lot of new code against it. Every shitty database design decision made over the previous years will echo down and make it's ugly way into your new nice code. You will be better for the long run to normalise and rationalise the DB first.

s.gif
Good point. Using stored procedures / views etc will help crystalise the API for the DB and allow work to happen behind that wall without breaking anything else in the meantime too. Once the work is done, bits of the wall can be replaced with better bits of wall i.e. improved sp's and views pointing to an improved schema.
Here's a way to introduce version control without having to stop everyone and teach them how to use it first:

1. Commit the entire production codebase to git and push it to a host (GitHub would be easiest here)

2. Set up a cron that runs once every ten minutes and commits ALL changes (with a dummy commit message) and pushes the result

Now you have a repo that's capturing changes. If someone messes up you have a chance to recover. You can also keep track of what changes are being applied using the commit log.

You can put this in place without anyone having to change their current processes.

Obviously you should aim to get them to use git properly, with proper commit messages - and eventually with production deploys happening from your git repository rather then people editing files in production!

But you can get a lot of value straight away from using this trick.

It's basically a form of git scraping: https://simonwillison.net/2020/Oct/9/git-scraping/

s.gif
How do you make sure that code being committed is ready to be run, files could be saved before they're ready. I'm assuming this won't happen on production server, but you can't be sure if it's just code workspace for someone.
s.gif
The system is an enormous black box and this at least tells you what N things were being manipulated at time in point X. Easy to setup and gives just a bit of peace and mind if the thing keels over one day.
s.gif
This doesn't.

It's not perfect, it's a step in the right direction.

s.gif
Love your Git scraping technique, very clever. Thanks for sharing.
Save yourself a lot of trauma and get out of this mess. Been in a similar situation, spent five years trying to fix things, gave up. Could have saved myself some of the therapy I now need.
s.gif
This alternative is not to be neglected. You don't have to save the whole world. Save what you can that is worth saving. If you are competent to build good new things, do that.
s.gif
Why is 'saving the world' even on the table here or in similar cases? Guys making $20mil a year don't look to me like they need saving from above. If you have that kind of money I trust you know what you're doing and can pay for help when help is needed. Otherwise, you don't deserve that cash. Other people might spend it better.

I, for one, have never been saved out of good heart or pitty. Every doctor visit somehow results in money being transfered from my bank account to theirs.

s.gif
Those people didn't have to be doctors, or nurses or med techs. They all need to make a living, but not necessarily by probing you. A good many of them entered medicine because it seemed like something they could do to help people. Even if that has since all burned away, you still benefited.

I spent a career not becoming a millionaire at Microsoft (definitely on the table, at the time) because Microsoft was and remains too evil. Likewise Oracle. Or making weapons. I do not answer Google recruiters' e-mail, and not just because Google interview process is far too annoying for anybody with any self-respect to tolerate. (Who does work there? Many worked for companies Google bought.)

Doing work that benefits humanity, or natural ecosystems or whatever, is the reason to do things. The money it pays is how you afford to be able to spend your life doing that.

I feel sorry for people who work knowing the work they do makes the world worse. But not very sorry, because most have a choice. Some find ways to add value from within, e.g. two I know at Oracle do Free Software full time. I mostly cannot tell which do or don't, so I do not condemn all Microsoft, Google, Facebook, Oracle, BAE employees. But I choose not to be there.

s.gif
Yes, life is too short and there are so many much better jobs to waste time on such a project.
s.gif
Do you work at Google? They never repair things. Hence 5 (10?) unfinished chat programs.
s.gif
A Google employee would have told you that they work for Google.
s.gif
i can imagine how some super senior engineer may like this kind of very challenging experience.
s.gif
A super senior consultant might: they are paid by the hour/day and they are free to fire the client and leave if the working situation becomes too hairy.

As an employee, the best advice is GTFO.

s.gif
True, perhaps some are into maintaining shitty legacy systems with not enough budget.
s.gif
OP is clearly not senior. If they were they would know how to get from A to B.
Unpopular opinion: this goes to show that you don't need no fancy microservices, distributed, asynchronous, highly available architecture to build a product that "generates more than 20 million dollars a year of revenue". No unikernels. No Kubernetes. Non of that cloud native mumbo-jumbo.
s.gif
most software developers are driven by having marketable skills and that requires having strong opinions so they can grift their way past other recruiters and developers who do the same thing

playing around with an out of vogue programming language in a company monorepo is a waste of time, in comparison

You may create a new system from scratch and write the new features there, while temporarily leaving the mess where it is.

The team will be able to try out how good programming can be and perhaps support you more. From there you should gradually move the old features in the new system. Even if you were to never fully complete the refactoring the situation would be much better.

Respectfully i dont think you are viewing this rationally and need to take a step back.

Some these things are terrible choices but some of these are just weird choices that aren't neccesarily terrible or a minor inconvinence at most.

E.g. no source control - obviously that is terrible. But its also trivial to rectify. You could have fixed that in less time it took to write this post.

Otoh "it runs on php" - i know php aint cool anymore, but sheesh not being cool has no bearing on how maintainable something is.

> "it doesn't use composer or any dependency management. It's all require_once."

A weird choice, and one that certainly a bit messy, but hardly the end of the world in and of itself.

>it doesn't use any framework

What really matters is if its a mess of spaghetti code. You can do that with or without a framework.

> no caching ( but there is memcached but only used for sessions ...)

Is performance unacceptable? If no, then then sounds like the right choice (premature optimization)...

> the database structure is the same mess, no migrations, etc... When adding a column, because of the volume of data, they add a new table with a join.

Not ideal... but also pretty minor.

Anyways, my point is that what you're describing is definitely unideal, but on the scale of legacy nightmeres seems not that bad.

Here is what I would recommend as a path, not towards changing "how we do things around here"(because that always gets negotiated against a certain business case), but as a way of gaining personal developer comfort(the actual nuts and bolts of the work):

1. Start adding logging all throughout, wherever changes are being made. That can quickly build up insight into what's happening where and gain confidence into what can be deleted safely. You want the meeting where you can show that an entire file is completely unused and has never once been called for months. It surely exists. Find it. Then say you won't delete it, you'll just comment it out.

2. As you make changes, start doing things twice: one in the way that patches the code as directly as you can manage, the other a stub into a possible design pattern. You don't want to force the pattern into production as soon as you think it works, instead you wait until the code hits a certain evolutionary state where you can "harvest" it easily. Think "architecture as a feature-flag". If it turns out your design can't work, nothing bad happens, you just delete those stubs and give it another go.

3. I would not actually worry about the state of the tooling otherwise. Backups for recovering from the catastrophic, yes. Getting the team on git, not as important. Adding standardized tooling is comforting to you because you're parachuting in. It adds more moving parts for the other devs. That's true even when they benefit from it: if the expected cost of wielding a tool wrongly is high enough to cause immediate danger, you can't proceed down that road - in woodworking that means lost fingers, in software it means lost data. You have to expect to wind down the mess in a low-impact, possibly home-grown way. There are always alternatives in software. And there are likewise always ways of causing fires to fight.

This job most likely isn't going to lead towards using anything new and hot. But if you go in with an attitude of seeing what you can make of the climate as it is, it will teach you things you never knew about maintenance.

1. Build a functional test system, and write a big suite of tests which ensure preservation of behavior. Make it easy to run and see the results.

2. Slowly start extracting code and making small functions. Document like crazy in the code as you learn. Keep the single file or close to it, and don't worry about frameworks yet.

3. Introduce unit tests with each new function if you can.

After all that is done, make a plan for next steps (framework, practices, replace tech etc).

Along the way, take the jr backend engineer under your wing, explain everything, and ensure they are a strong ally.

Call me crazy, but that project sounds like fun.

s.gif
> Build a functional test system, and write a big suite of tests which ensure preservation of behavior. Make it easy to run and see the results.

Keep it to yourself and don't let anyone know why you are so effective.

Demand a raise early once you are sure of your value.

Edit: why not? Clearly this is a huge value that would be wholly unappreciated without leveraging it yourself.

s.gif
OP is leading a team, not hiding in a churning out code. The "secret superpower" strategy doesn't work here.
> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight

In my view, as long as management believes this, a fix is not possible at all.

You should forget about improving the code but see your job as a kind of consultancy thing where you teach management about what they have and the consequences of that are.

And probably look for a new job. If you are completely successful with teaching management, it may be working on this, but it'd probably need to be renegotiated as if it were a new job

Why is it necessary? It's ugly, but it's earning millions. Find out why resistance to change is huge. Maybe the devs are stuck in their ways, or maybe it's too damn scary for them to fail because the business comes down hard on them? Get to know your devs, find out from them what they think the problems are and how they would like to solve them. Maybe they just need more help. And as the team lead it sounds like you need to bring the "business unit" up to speed on the current reality?
Even with the worst possible codebase, the act of it running stably for an amount of time is an accumulated value. And that value is a actually pretty big deal. And when you make major changes you reset that value back to zero even if your new codebase is beautiful and sensible.

Very gradual, well-tested evolutions is the way to go. If it were me I would add a LOT of unit and integration tests before I changed anything. I would also formalize the expected behaviour, schemas, APIs, etc.

You’ve inherited the Ship of Theseus. Believe it or not, this is actually a huge boon for you. 18 months from now your managers will look back and say, “wow this is the same ship?! I want you on my team wherever I end up.”

> I know a full rewrite is necessary, but how to balance it?

A full rewrite of a functional 12-year old application? Yea, you're going to waste years and deliver something that is functionaly worse than what you have. It took 12-years to build it would realistically take years to rebuild. Fixing this will take years and honestly some serious skill.

What you want to do is build something in front of your mudball application. For the most part your application will be working. It's just a mudball.

Step 0. Make management and HQ understand the state of the application. To do this I would make a presentation explaining and showing best practices from various project docs and then show what you have. Without this step, everything else is pointless.

If they don't understand how bad it is. You will fail. Failure is the only option.

If the team is not willing to change and you're not able to force change then you're going to fail.

So once you have the ability to implement changes.

Step 1. Add version control.

Step 2. Add a deployment process to stop coding developing in production.

Step 3. Standardise the development env.

If you have views and not intermingled php & html:

Step 4. Start a new frontend and create endpoints that reuse the original code to return json for all the variables.

If not:

Step 4. Add views. Copy all the html into another file and then make a note of the variables. Step 5. Start a new frontend and create endpoints that reuse the original code to return json for all the variables.

... Carry on moving things over to the new frontend until everything is in the frontend.

Probably a year later.

Step 6. When adding new functionality you can either rewrite that section, do a decorator approach, or edit the original functionality.

That's without fixing the database mess or infra mess.

Others have said that, but I'd like to put emphasis on getting everything in source control. If other developers don't know about source control (!) they will love it. I'd spawn my own local source control solution for my own control and after a few changes, show them the advantages.

Second: making a change without tests is like walking in the dark without a flashlight. Having tests is a very important thing.

Read Working Effectively With Legacy Code", by "Michael Feathers, one of the best books I've read that really can help in situations like that. In summary, it boils down to having tests to aid your changes you need to make.

3 junior engineers are holding together legacy code that generates 20 mil a year having had no leadership that has taught them any sort of best practices? Give them all raises and get over yourself.
s.gif
OP says it has been developed over the course of 12 years. The team is currently 3 junior devs. There's probably a huge turnover.

It's likely there's a lot of history and political shenanigans that OP isn't aware of yet. This could be a sinking ship. If it's a profitable business why is the team made of juniors?

A small company with legacy code that is a huge mess but that is maintained by the same person for the last 10 years is one thing. The same mess in the hands of 3 juniors who don't even use version control means no one with experience has lasted long enough at this company. That's a red flag.

I'd be curious about the premise under which they hired you. Did they hire you to re-do the application, knowing it was not in great shape? Or did they just hire you to up the team from 3 to 4, imagining a relative boost in productivity?

Also, the "without managing them directly" is interesting. Are you a peer of the existing three team members?

There is a way out of this mess. It is not straight forward though. There are two ways:

1. Convince the business team that these team members might leave and put the $20mn revenue at risk. There is no way you can make them learn and do things properly. Therefore, take separate budget, hire a new separate team. Do full rewrite of backend and plug the app and new website into it. It would be 1-2 year project with high chance of getting failed (on big bang release...stressful and large chance of you getting fired but once done you can fire the oldies and give the business team a completely new setup and team) or partial failed (that means large part of traffic would move to new system but some parts would remain...making the whole transition slow, painful and complex plus never ending).

2. Add newer strong and senior php members to the existing team. Ask new senior members to not fight with them but train them. They would listen to them as these guys would know more. Slowly add version control, staging-dev envs, add php framework for new code, add caching, CI/CD pipeline, bring on a automated test suite built by external agency etc. This would be low risk as business team would see immediate benefits/speedups. Rewrite portions of code which are too rusty and remove code bases which do not required anymore. This would be possibly take 5-6 years to complete, giving you ample job security while achieving results in a stable manner.

I'd strongly advocate for what I call "The Boy Scout Campsite Approach to Code Improvement". Boy Scouts have a motto "Leave the Campsite Better than You Found It". What I'd do is:

* Pair program to teach people you work with that there is another way; they may simply not know any better. * Make any code you touch better; new / old it doesn't make any difference. Do it right. * Important: NEVER, EVER COMPROMISE ON THIS!!! Seriously you skip one time and it can all be downhill after that (sayeth the voice of regretted experience).

rewriting would probably be the biggest mistake you could make. you need to refactor things little by little, and avoid making changes that would be made redundant later by changes that happen later on.

figure out what you want to fix first, and then fix that. then go to the next thing. but keep in mind - "management and HQ has no real understanding", and as far as they are concerned, what they have works.

if this doesn't sound like something you want to do, then find a new job. you are effectively the property manager for a run-down rental property. you aren't going to convince the owners to tear it down and build a new set of condos.

s.gif
> you are effectively the property manager for a run-down rental property.

This is an incredibly powerful analogy. Thank you!

Sounds like you're working with a previous client of mine.

The best solution - for me - ended up dropping them as a client. There was zero interest in change from both developers and management (no matter how senior).

We parted ways and I wished them good luck.

Occasionally I wonder what happened to the application containing 50,000 procedural PHP files. Yes, 50k. And no source control or off-server backup.

s.gif
Yes. It's a pointless uphill battle to try to change people who don't want to change. The employees have a lot of leverage by not documenting the mess. If they leave you will take the blame.

Get another job ASAP. Let natural selection do its magic.

s.gif
The "employee" lock-in played a heavy part too. No documentation meant they had significant leverage - it was an 8-person team who worked together for nearly 15yrs.
I think the main thing, no matter if you're the team lead of just an individual contributor, is were you asked to fix the things you list? If you weren't, it's not your job. And even if you try, you will not succeed, because some people like things as they are (otherwise they wouldn't be that way) and if you don't have support from above then you won't be able to overrule those people.

One thing you could do if you haven't been asked to fix these things is to "provoke" management into asking you to fix these things. You could talk to your boss and ask them what they don't like about the current setup. They might answer that the velocity is too slow, that the software is too unreliable, has too many bugs, or they might answer that everything's fine they just want you do implement their new features. Be careful not to lead management here, you want to find out what they actually want, not persuade them to want something (that won't work, it won't be a real desire). If they do want you to change something, you can argue for some of the suggestions in this thread (e.g. introduce VCS) where you can clearly draw an argument from one of the desires e.g. problem "releases are too risky", solution "if we use VCS we have old versions and can roll back".

Basically you've been hired to do a job. If your job is to fix all this stuff, fair enough. But if you haven't been asked to do this (and you can't provoke them to ask you) then it's simply not your job, and you have to accept the situation or find a new job.

Many comments are permutations of different suggestions do here's mine:

Get the business to buy into "fixing" this before doing anything. Convince them to hire more, sounds like the current team is already swamped.

If you don't get business buy in, it may be the wrong place for you.

Wow hard one to untangle and answer in a HN reply!

I sort of think... if you have to ask this here you might be in the wrong job? Was this a job that seemed like something else then became this? This sounds like a job for an experienced VP Engineering. It is a tough order. Wouldn't know how to do it myself. Lots of technical challenges, people challenges, growth challenges, and managing up and down.

The resistance to change is something you need to get to the bottom of. People are naturally resistant to change if they are comfortable, and we've all been through 'crappy' changes before at companies and been burned.

The solution might be to get them to state the problems and get them to suggest solutions. You are acting more like a facilitator than an architect or a boss. If one of them suggests using SVN or Git because they are pissed off their changes got lost last week, then it was their idea. No need to sell it.

This assumes the team feels like a unit. If the 3 are individualistic, then that should be sorted first. E.g. if Frank thinks it is a problem but no one else does, and they can't agree amongst themselves, then the idea is not sold yet.

Once you know more about what your team think the problems are and add in a pinch of your own intuitions you might be able to formulate confidently the problems, so you can manage their expectations.

Create a 'risk register', or 'service schedule' - we use this for building service contracts, where you list out all the things that need to be done or might happen (from backups to support requests), and put a rough cost on them all. We put a min and max cost on each item, and the number of times per year it might happen.

That gives you an annual maintenance cost which will include, say "every 2 years something goes badly wrong with the flargle blargle, and costs $10,000 to fix", or "every 3 days we have to clear out the wurble gurble to stop it all crashing".

Finally, you put together the same thing but for a re-written version, or even with some basic improvements as others have suggested, and hopefully you see a lower total cost of maintenance.

At that point, you can weigh up the cost of either a rewrite or incremental improvements in actual dollars.

I'm no expert, but the advice in Working Effectively with Legacy Code has been helpful on occasion:

https://www.amazon.com/Working-Effectively-Legacy-Michael-Fe....

Fully apprising management of the situation in a way they can understand may also reap long-term dividends.

I've been in almost this exact same situation with a slightly smaller team and "only" about $5 million running through the tangled web of php.

We did a complete rewrite into a Django application, it took 2 years and untold political pain but was absolutely the correct choice. The legacy code was beyond saving and everyone on the team agreed with this assessment - meaning our political battles were only outward facing.

In order to get support, we started very small with it as a "20% project" for some of our engineers. After level setting auth, cicd, and infrastructure stuff, we began with one commonly used functionality and redirected the legacy php page to the new python-based page. Every sprint, in addition to all the firefighting we were doing, we'd make another stealth replacement of a legacy feature with its updated alternative.

Eventually we had enough evidence that the replacements were good (users impressed with responsiveness, upgraded UI stuff like replacing default buttons with bootstrap, etc.) that we got a blessing to make this a larger project. As the project succeeded piecemeal, we built more momentum and more wins until we had decent senior leadership backing.

Advocating for this change was basically the full time job of our non-technical team members for 2 straight years. We had good engineers quit, got into deeply frustrating fights with basically every department in the company and had rough go of it. In the end though, it did work out very well. Huge reduction in cost and complexity, ability to support really impactful stuff for the business with agility, and a ton of fulfilling dev experience for our engineers too.

All this is to say, I understand where everyone warning you not to do a rewrite is coming from. It's a deeply painful experience and not one to be embraced lightly. Your immediate leadership needs to genuinely believe in the effort and be willing to expend significant political capital on it. Your team also needs to be 100% on board.

If you can't make this happen and you're not working on a business which does immense social good and needs your support as a matter of charity, you should quit and go somewhere more comfortable.

s.gif
It sounds to me like you did a full rewrite by replacing the app piece by piece, sprint by sprint, releasing changes quite often and bringing that value all the way to the user. I think that is really clever.

My impression from others in this thread is that they mean "start from scratch and build until features are on-par with current product" when they say full rewrite.

Your version of full rewrite seems like it is generally applicable, but I have very little faith in the latter approach.

s.gif
This sounds like the story of a proper, piecemeal, rewrite where the whole team was on board.
That does look like technical bankruptcy, however rewrites of large projects almost always fail (especially without management buy-in and feature-freezes)

A strategy you can use is to incorporate any refactor into the estimates for a "new feature" development with the idea being that if you have to touch this part of the codebase that it gets refactored.

In this case since there's no framework I suggest to have a framework gradually take over the functionality of the monolith and the fact all the routes are in nginx will actually help you here because you can just redirect the route to the new framework when the functionality is refactored and ported into the new framework.

Do not refactor the database as interoperability between the legacy project and the new project can fail although migrations should be executed in the new project.

What I do suggest is to get development, staging, pre-production and production environments going because you will have to write a lot of pure selenium tests to validate that you didn't break important features and that you did correctly recreate/support the expected functionality.

You can run these validation tests against a pre-production environment with a copy of production. This also gives you feedback if your migrations worked.

On the team, that's the hard part. If they walk out on you, you will lose all context of how this thing worked.

As precaution, get them to record a lot of video walkthroughs of the code as documentation and keep them on maintaining the old project while you educate them on how to work in the new system. The video walkthroughs will be around forever and is a good training base for new senior devs you bring in.

Last, make sure you have good analytics (amplitude for example) so you know which features are actually used. Features that nobody uses can just be deleted.

Over time, you will have ported all the functionality that mattered to the new project and feature development in the new project will go much faster (balancing out the time lost refactoring).

A business making 20 million/year should be able to afford a proper dev-team though, what are they doing with all that money?

You should be able to get budget for a team of 5 seniors and leave the juniors on maintenance of the old system.

I'm not a career dev, but I have inherited teams and projects before that were a huge mess...

This isn't going to come off nicely, but your assumption that it needs a full rewrite, is in my eyes a bigger problem than the current mess itself.

The "very junior" devs who are "resistant" to change are potentially like that in your view for a reason. Because of the cluster they deal with I suspect the resistance is more they spend most of their time doing it XYZ way because that's the way they know how to get it done without it taking even more time.

What it sounds like to me is that this business could utilize someone at the table who can can understand the past, current, and future business - and can tie those requirements in with the current environment with perhaps "modernizing" mixed in there.

It makes $20M a year. That sounds like great code.

I would: 1. Get it in source control without “fixing anything”. 2. Get a clone of the prod server up and running, vs a clown of the db. 3. Put in something to log all of the request/response pairs. 4. Take snapshots of the database at several time points and note where they occur on the log history from number 3.

You now have the raw material to make test cases that verify the system works as it did before, but for bug, when you refactor. If the same set of requests creates the same overall db changes and response messages, you “pass tests”.

First thing to refactor is stochastic code. Make it consistent even if it’s a little slower so you can test.

Once you can refactor, you can do anything. Including a full rewrite but in steps that don’t break it.

If you try to rewrite it from scratch it will probably just never be deployable. But you’d an rewrite it safely in chunks with the above.

20 Million dollars per year.

This should be the thing that starts every conversation. Because IT WORKS for the intended purpose.

Someone else said it. Put everything in source control first.

And just fix things that directly impact that 20 Million dollars a year.

Example, fix speed issues. Fix any javascript issues. Fix anything that will get you to 21 million dollars a year.

Then if you want, you can put together a small sub-team that would be responsible to transitioning certain pages into a framework. But don't rewrite the whole thing.

s.gif
Right. An attempt to do a full rewrite just doesn't pass the most basic cost/benefit analysis.

Redoing it in [sexy language / framework / paradigm / design pattern] will feel aesthetic, but even if it goes perfectly, it won't get you to $40M a year.

But if it goes poorly, it might get you to $0M a year (and fired).

The bigger question for OP should be "do I wish to be employed by an organization that is OK with these engineering practices?" Nobody can change culture by themselves, and certainly not just by introducing a sexier technology.

s.gif
Doing in new language will subtract some value. Adding new features may add value. Doing it as an spa could kill the entire thing.
> Resistance to change is huge

This is the key point. Why is there resistence to change if everything is as bad as you say? How does tings look from the perspecive of the developers?

There is also a certain disconnect in what you are describing. On one hand you describe the developers as “junior”, productivity as absymal and it is inpossible to get anything done. On the other hand the code seem to be highly sucessful from a business perspective, generating millions in revenue. Something is missing in your analysis.

I have to admit that I would have little patience for such a situation, so good on you giving it a try and godspeed. I personally feel that there's a lot of interesting work out there, so I'm not sure I would even take on the project. The fact that you have a team of three people that you don't manage or have any authority over who are also majorly resistant to change their ways of doing their job in literally the worst way possible does not seem promising, no matter how you look at it.

In these types of situations, the problems are social and possibly political and rarely technical, even though the technical problems are the symptoms that present themselves so readily.

«team is 3 people, quite junior» - run, buddy, run..
Epic fail.

The way to fix things involving people is through something called leadership. That means you need to double down on your soft skills and you need the explicit support of management. If you hope a framework will do this for you then you are just as broken as that you wished were fixed.

Train your team, set high standards, and focus on automation (not tools, not frameworks). This is a tremendous amount of work outside of product. If you aren’t willing to invest the necessary extra effort you don’t seem to care that it’s fixed.

As long as you don't plan for a full rewrite all at once the technical part is not the hardest aspect, it's even the best part!

> fix this development team without managing them directly

This is the worrying part. If you're not their manager, or at least the technical lead dev it's a lost cause. Because you need to laid a plan and have complete buying from management.

There's almost no realistic salary that can make it for working on (I presume) PHP 5 and this codebase forever and the effect on your career future prospects.

It’s your career etc. but as a well meaning random stranger:

>> I know a full rewrite is necessary

Rewrite it in rust! /s

You’re most likely focussing on the wrong thing here. The tech doesn’t matter. It’s a business, this bit matters:

>> this code generates more than 20 million dollars a year of revenue

You need to be able to quantify which lines of code you’re going to change to increase that 20 number to something higher, or at the very least, increase the amount of that the business gets to keep rather than burn on costs.

This might sound like a hard problem at first glance but it’s really not.

>> This business unit has a pretty aggressive roadmap

This is a positive. To be clear the worst case is an apathetic business unit. This is huge, you’re already ahead. People want things from you so you’re free to exchange what they want for what you need. Think of other business units as part of your workforce, what can they do to help you?

>> management and HQ has no real understanding of these blockers

Yeah that’s the way it is and it’s totally ok, management doesn’t fully appreciate the blockers impacting the HR unit or plant maintenance or purchasing or customer service or etc etc but they DO NEED to know from you the problems you can see that they care about.

That means issues about how code quality are problematic are out of scope but informing management that your team are going to continue to be slow for now are in scope.

Issues about developing in production are out. Issues about your working practice is unsafe and we have a high risk of breaking the revenue stream unexpectedly over the coming weeks and months, that’s in scope for being communicated. At the same time, take them through the high level of your mitigation plan. Use neon lights to point out the levers they can pull for you, e.g. we need SAAS product X at a cost of $$$ for the next year to help us deliver Y $$$ in return.

For every strategic piece of work you line up, be clear on how many $$$ it’s going to unlock.

Be clear on how you personally can fail here. Transparency and doing what you say you will go a long way.

Practice saying no.

You’re an unknown quantity to them so get ahead of that. For example, make it so you’re always first to tell the other units when the product has broken for a customer, rather than customer service telling you about a support ticket that just came in.

First off, source control. I would say this was a day 1 job.

Get some type of CI/devops thing going so you can deploy to a temporary test environment whenever you want. This applies to the data too so that means getting backups working. Don't forget email notifications and stuff like that.

Next comes some manner of automated testing. Nothing too flash, just try to cover as much of the codebase as possible so you can know if something has broken.

Go over the codebase looking for dramatic security problems. I bet there's some "stringified" SQL in there. Any hard coded passwords? Plaintext API calls?

And now everything else. You're going to be busy.

s.gif
Security could actually be a way to sell the need for cleanup. Hire a team of independent auditors. If the code is in such a bad state as you claim, i guarantee they will find at least a dozen of XSS and XSRF issues, very likely some SQL injections and possible even a few RCE as root.

Maybe not the best way to increase direct revenue if the product is working, but it highlights the risk they are taking with such a shaky foundation, and puts the decision on managements table rather than yours.

Sounds like there's no source control... So you should just pick things off bit by bit. Don't rewrite, especially when you have no process to roll-back.

- Source Control

- CI/CD process

- Lock down production so there's no access

- Kill off dead code

- Start organizing and refactoring

etc...

Edit: Alot of people have already said the above. But I want to add.

Just because code sucks and is messy, obscure, has no structure or breaks everything we learn as developers that define 'good code' or 'good coding practices'... does not really mean it's bad if its generating the business money.

It can often be quite fun to work on because everything is a win, performance, cost reduction, easier maintence, etc.

1. The existing team clearly has done a great job achieving this level of revenue with a small team. Be sure to compliment them on that and realize that they probably have a very profound understanding of how the tech achieves the business outcomes. 2. Start with source control 3. Build up test coverage, document and understand all the endpoints and what business outcome they achieve 4. Build some small things on a new tech stack to train the team 5. Move over everything when the team is ready
To be honest. I would run off. This is the kind of hell where nobody wants to work. Where only "experts" know how things work and how to expand or fix things. It will secure their job but will make yours hell.

You personally will gain no knowledge there, just that your codebase is hell.

You cat try to convince the management of creating a new gen implementation. Not a rewrite. New software, that can fulfill customer needs better. Compete better, is safer and better to extend to do all this in the future.

One thing you can do though is to immediately set up modern practices. SCM, Code Review, CI, Tests (most of the code might not be unit testable in this state, but some tests at least) - This way you can see what others do when they add if fix something and learn better (SCM, Reviews), make changes and know that you did not break the whole thing (Tests) and have CI to at least ensure the tests run and everything works and it will glue all together.

Good luck

What do you mean by "without managing them directly"? Are you a manager or aren't you? If you're a scrum product owner, you are (or at least you set priorities).

You speak pf "resistance to change", from juniors? You are the change. You get to set the agenda, not them. Unless you don't, in which case you can't fix anything. But legitimacy comes not just from authority, but also from rigor. Anything you truly dictate needs to be 100% based in evidence and fact. This means letting go of implied a-prioris such as "PHP is bad" and "we must use a framework". The only real constraint is to keep the gravy train rolling.

So what exactly is your role, the thing you were hired for? If it's to manage, manage. If it's anything else, the best you can do is lead by example. But one way or another, you'll have to let go of some things.

What's the problem? THis is the best work imaginable.
> generates more than 20 million dollars a year of revenue

> team is 3 people

> post COVID, budget is really tight

Why? All technical details aside if this can't be addressed I wouldn't even bother trying unless I owned stock.

s.gif
For all we know it's a car auction website and it selling 500 $40,000 cars.

Like the company actually buys the cars and sells them. $20 million revenue, cost of goods $18.5 million.

For all we know this website could be replaced with eBay or a cheap car dealer SAaS website.

Nobody mentions Monitoring.

Ensure the beast is monitored, like staring with the basics, cpu, disk space and so on.

Then all goes to version control. Then changes can not be done in production, you need cicd, just build one step at a time.

Do not aim for perfection, just concentrate on having a framework(mentality) of continuous improvement.

You been given the opportunity of testing all your skills in a thing that "works" (makes money), you just need to find the metrics of where the money comes from and how to maximise it.

Pareto principle can be of help when making decisions.

Hi! I specialise in fixing failing projects. Yours probably not the worst I have seen.

First of all, don't do a rewrite! Your team most likely do not know what they need to know to be able to perform a clean rewrite. You are new and still probably don't know the whole picture and all the knowledge that is in the application in one way or another. If you start a rewrite, the productivity will plummet and you will have to keep choosing whether to put resources on the new or on the old and the old will always win. I have seen this play out many times, the rewrite keeps getting starved of resources until it gets abandoned.

Refactor is better because you can balance allocating resources to refactoring as you go and also keep brining improvements that are BAU development more efficient.

Do not make mistake of forgetting about "the business". They probably are already irritated by the project and will be on the lookout for any further missteps from you. You might think you have good credit with them because they just hired you but that might simply not be the case. Their fuse is probably short. You need to keep them happy.

At first, prioritise changes that improve developer productivity. This is how you will create bandwidth necessary for further improvements. This means improving development process, improving ability to debug problems, improving parts of the applications that are modified most frequently for new features.

Second, make sure to prove the team is able to deliver the features the business wants. The business probably doesn't care about the state of the application but they do care that you deliver features. This is how you will create the credit of trust with them that will allow you to make any larger changes.

Do make sure to hire at least one other person that know what they are doing (and know what they are getting into).

This is a lot. I have done a rewrite approach before. It is only one option, but if you're committed, it's probably the one that has the best chance to preserve your sanity. It can work, if you're clever about it.

The goal is to slowly build up a parallel application which will seamlessly inherit an increasing number of tasks from the legacy system.

What I would start with, is building a compatibility layer. For example: the new code base should be able to make use of the old application's sessions. This way, you could rewrite a single page on the new system and add a reverse proxy one page at a time. Eventually, every page will be served by the new application and you can retire the old.

I would stick with the language but pick up something more capable, ie. Laravel. This makes it easy to copy over legacy code as needed.

Godspeed.

Do. Not. Full. Rewrite. It would be absolute suicide and almost certainly fail. Just put that option out of your head.

1. Complete a risk assessment. List all the security, business, availability, liability, productivity, and other risks and prioritize them. Estimate the real world impact and probability of the risks, describe examples from the real world.

2. Estimate the work to mitigate each risk. Estimate multiple mitigation options (people are more likely to agree to the least bad of multiple options).

3. Negotiate with leadership to begin solving the highest risk, lowest effort issues.

But before you begin all that, focus on the psychology of leadership. Change is scary, and from their perspective, unnecessary. The way you describe each risk and its mitigation will determine whether it is seen as a threat or an exciting opportunity. You will want allies to advocate for you.

If all of that seems like too much work, then you should probably either quit, or just try to make small performance improvements to put on your resume.

Upper management need to understand the problem and the options and need to buy-in on whatever you want to do.

Practically: cut the bleeding, get the current team at least using version control and working with a CI environment. That will be a lot of effort (been there before with a similar .Net product but much better team).

Then you're going to need significant resources to re-build on a modern architecture. I would simply go with releasing another product if that's at all possible. You clearly have some market and channel to sell into.

Just beware: this sounds like a problem which will take 3-5 years to solve and whose chance of success is dependant on organisational buy-in. So you need to ask yourself if you're willing to commit to that. If not, quit early.

1) add source control and put a deploy system in place (start with manual steps, then automate what make sense

2) depending on the size of your db, you may want to just go with a shared dev db.

So now you can fix and enhance things in dev

3) add in a modern web framework. Depends on your app but I would go on something like Symfony: same language, can integrate old stuff you don’t want to rewrite yet.

4) Slowly and steadily migrate your routes to the new framework based on the new requirements

Last point is key, it is very likely to miss crucial logic hidden in existing code.

When approaching seemingly insurmountable technical issues, I've found it important to find the root issue of what's causing all the chaos. Yes, the code is a mess. OK, the database design is non-sensical. Sure, things never get deleted. But why?

From what you've mentioned, it sounds like every change that isn't additive is viewed as too risky. So at this point before trying to make big shifts, some work should be done to de-risk the situation as much as possible. Granted, you probably can't stop work and introduce a bunch of new practices and patterns, but you need to start reduce the risk to unleash the team to make necessary changes.

For example, introducing version control should be a slam dunk. Start using a database migration facility for all database changes. Create a release schedule that requires features to be stabilized by a certain window for deployment. Create some really, really simple Selenium tests by just browser recording yourself using the app.

Once you can start making changes more confidently, then you can start unwinding some of the bad choices moving forward. Resist the urge to start "making a good foundation for the future" by trying to rewrite core parts of the system immediately and instead start thinking in terms of forward progress oriented changes. Need to add a feature? Make sure to write that feature properly with good practices and make only the necessary changes to other parts of the system. I realize that's probably going to be painful, but eventually you will accrete enough of these small changes that you can string them together with a little more work into larger scale changes under the hood.

These things are rarely easy, especially in established legacy systems. But if this is the revenue engine for your company, you'll need to move conservatively but decisively or risk making the situation worse. Good luck!

s.gif
This is really good advice. Thanks for sharing this. I think you've probably nailed it on the head about trying to find the reason and it likely being because people are too afraid of the risk of messing stuff up, so they keep adding to the ball of mud.

However, in my experience, it can be very touch and go dealing with people who become so risk averse. I had a job one time where a previous employee refused to give up a computer that had to be at least 10-15 years old at the time (was running Windows 2000 or something like that) and took about 30 minutes to boot up. Because somehow it was the only computer that could run the 3D CAD program he was familiar with or some other "reasons" that it was essential to the project. The only way of moving forward was that he luckily completely washed his hands of the project before I even joined the company, and then I did a full rewrite and redesign of the whole system from scratch (which was absolutely required in that case). Even then, when I asked him about the computer to just be careful and do due diligence to try and figure out why or if it was actually important, he was very resistant of me sending the computer to the salvage department after getting the files off the computer.

What I want to know is, is this actual software sold for $20 million a year?

Or does this software "facilitate" $20 million of revenue, instead of generate it single handedly.

What if were talking about a car sales website that 'generates' $20 million in revenue via selling 500 $40k cars?

Check if you're breaking the law. Those junior programmers may well have not thought to worry about privacy laws or industry regulations.

Then check for the most basic security issues like the database being accessible from the outside, SQL injection, etc.

Then set up monitoring. It's quite possible the thing is falling over from time to time without people knowing.

> ... fix this development team without managing them directly ...

That is your core problem. If you are not directly managing then how can you bring about any changes?

If HQ management can't see the problems you see, then you are unlikely to receive any support for the changes you are contemplating.

Your number one problem is politics not technology.

People suggest rewriting little-by-little. Does that really work in practice? And why would one do it? Why not let that business rot-in-place so to speak, while building a new business on a new platform to "compete" with it?

I worked for a company early in my career that sold a $1500 piece of software and had revenue of $15 million. When I was there, the head could was 70. Ten years later the head count is two - one engineer and one person to take the orders. And revenue was still a couple million. A classic "rot-in-place" situation.

s.gif
Rewriting little by little just means: each time you make a change, leave the source base at least a little better than you found it. Leave a few comments about the thing you reverse engineered. Delete a little dead code. Eventually you get the confidence to move from the lowest hanging fruit to deeper refactoring. You do it because that approach may be the best you can do with a rotten source base within your time and resource constraints,
Ask yourself two questions. Why is it that the things are the way they are? What can I realistically change? Then determine the overlap in these. If there is none, walk away. It makes no sense to go for a rewrite without the underlying causes being addressed, you’ll be in the very same mess very rapidly. It makes no sense to replace the team without understanding why these folks have endured, who has hired them etc. Understand first and then make changes.
Not sure if this helps, but if I were you, I’d:

* Create a git repo from the code as it exists

* If the other team is still doing things live, create a workflow that copies the code from the prod server to git as-is nightly so you have visibility into changes. Here’s an opportunity for you to see maybe what the team gets stuck on or frustrated with, and you can build some lines of communication and most importantly some trust. You can suggest fixed and maybe even develop the leadership role you need.

* Get a staging instance up and running. If I had to guess why the team does things live, maybe the project is a huge pain to get sample data for. If that’s the case, figure out the schemas and build a sample data creation tool. Share with the team and demonstrate how they can make changes without having to risk breaking production (and for goodwill - it helps prevent them from having to work evenings, weekends, and vacations because prod goes down!)

* PHP isn’t so bad! Wordpress runs a huge chunk of the web with PHP!

* tailwind might be a cool way to slowly improve CSS - it can drop into a project better than other css frameworks IMO

* Pitch your way of fixing this to management while quoting the cost of a rebuild from different agencies. Throw in the cost of Accenture to rebuild or whatever to scare management a little. You are the most cost effective fix for now and they need to know that.

Code is always part of a larger business strategy. You are working for a small business that has found a way to leverage large revenue off of cheap talent. A full rewrite will destroy this business. Instead look for low hanging fruit like teaching the devs source control in a respectful way that is actually useful to their existing work process.
Run.

What you need to do is full rewrite. You need business owners backing you up on this intention. From your description they don’t understand the scale of the problem. So that’s a dead end.

When they will understand that they have to halt all new developments for few years and drastically increase budgets for development team meantime, you can start thinking about how to proceed. But they will not.

It works, it generates a large amount of revenue

leave it the fuck alone

s.gif
There’s that joke graph about “happiness in the life of a thanksgiving turkey”, where things are going amazingly right until a straight drop to zero. It works “right now”, up until it doesn’t, or there’s an outage that you can’t recover from, or some bad code wipes prod and your backups are useless (in this case likely nonexistent).

It also sounds like it isn’t really working even right now - from what op claims the productivity is not at all able to meet the deadlines being imposed by upper management. Death by competitors moving faster with a better product is a real thing, and if the tech stops them from doing so, that’s a problem.

The best strategy probably isn’t a rewrite, as others have suggested, but “don’t touch it if it works” is frankly an irresponsible strategy.

I’ve worked in a team where poor core tech (along with a sort of emporer-has-no-clothes situation where upper management found it politically impossible to acknowledge the issue) directly killed the profit, although this was in the market making space which has a much more direct reliance on technology. they got into their situation with exactly the attitude of “if it works, don’t touch it!” and basically stayed still while the competition flew ahead of them. Their product “worked”, in that it did what it was supposed to, but iteration on quality of trading and strategies was next to impossible.

s.gif
A product/service in this state is a ticking timebomb. The fact that it's responsible for that amount of revenue makes it more dangerous. There are probably tens (maybe hundreds) of vulnerabilities that either compromise the whole platform, or at least give access to all customers' data.

IMO there are three realistic approaches:

- Keep it in its current state with the intent of making as much money as possible until the timebomb goes off, and then run away. Just to be clear, I don't think this is ethical, but a lot of people would choose it anyway.

- Ship-of-Theseus it into a supportable state.

- Leave ASAP so it becomes someone else's problem.

IMO the first one is only an option for the people that run the company. For the manager of the Dev team, they only have the second and third options, because when the timebomb goes off, they are going to be the scapegoat, not the person running off to the Bahamas with a sack of cash.

I've seen multiple ticking timebombs like this go off in years past, and I was usually part of the heroic efforts to stop the money hemmorhages that ensued afterward. I strongly recommend avoiding it altogether.

s.gif
Honestly, I know this answer isnt going to be that popular in some circles, but.. yes. leave it. If it offends your sensibility so much that you just can't be 'caretaker' for this mess, walk away.

But if it's "working fine and generating heaps of cash" as far as upstairs is concerned, there is no way you play the 'refactor/redesign/replace' game and come out ahead.

s.gif
“productivity is abysmal which is understandable. The mess is just too huge to be able to build anything.

This business unit has a pretty aggressive roadmap…”

Sounds to me like it doesn’t work.

s.gif
Also go work somewhere close to your standards. From experience I can tell you that this is a battle you can’t win in a reasonable time table. There’s a reason it is the way it is and you can’t change those people.
s.gif
This is the correct answer. I’ve heard enough stories on HN about nightmare-level codebases that churn out massive profits. One dude had his entire application in a single PHP file and was generating like 20k/month.
Why is the junior non-productive team resistant to change? answer this and you might get an answer on the path forward. It sounds like this team wasn't responsible for this mess - but then they should be excited to try something better. On the other hand, if they think this is all great, then why is productivity poor? if this is all PHP code I'm not sure what the difference between front-end/backend would be - what is the mobile person doing on prod PHP code?
As I see it, nobody in their team, except for the OP, sees this is as a problem. IMHO, any software exists to solve a real-world problem. It does not exist solely for it's software architecture, for it's tests, for it's UI or for it's maintainability. Unless the stakeholders of the organization don't give value for the time taken to roll out new features, or think that constant bug-fixing is in the nature of software, then they don't have any value for that software.

This is pretty apparent since they seem to be earning 20 million dollars with a software managed by three junior engineers.

My advice to the OP - if you value good software engineering, this is not the organization you should be working for. Because no matter what you do, your effort will not be appreciated and you'll be replaced with a junior developer as soon as the management deems it necessary.

First leave the company, three junior devs supporting a $20m system isn't realistic.

Hey I've done this, everyone states just rewrite each part isn't really helpful.

You first need to fixup obvious brokenness, turn on error logging and warnings within fpm, next fix absolute path issues, next fix any containerization issues (deps, etc) and containerized it, next roll out some sort linter and formatter.

At this point you have a ci system with standardized formatting and linting now slowly part out things or do a full rewrite as you now can read the code make changes locally

> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight

Listen to he more experienced people in the thread. They have good advice. Probably ignore the people who were on one lucky project that worked out with a risky full-rewrite.

But, the business' ambitious but naive plan is not viable, and it's your job to communicate why, and figure out how a less ambitious series of slower goals could be achieved. If I were in this position, as an IC, I'd literally just refuse to shoulder the stress of naively agreed upon deadlines etc.. because it wouldn't be feasible unless I risked burnout for probably the not-enough salary.

Rewrites aside, (it may or may not improve or work) but the thing I have noticed/learn about HUGE codebases which are 'bad' is to implement "feature-options" and have an "unbreakable rule" about coding to a client/actor/instance.

Nice thing is you can start with the current codebase, and add in these, it will make the rewrite a lot easier since your capabilities/feature-configs are already extracted.

Example:

If your product/code serves multiple customers, you should never have:

    if (customer-id==123 || customer-id==999) {
       //do things this way
    }else {
      //do things the other way
    } 


 instead always aim for feature-options (config?)

  if (config-feature-a == true){
       //do things this way

    }else {

      //do things the other way

    } 

If this seems not related to your codebase or product, you just need to dig deeper, it's usually there in some form or another.

PS. If you think the above is 'obvious', you have probably not seen an old enough (or bad enough ?) codebase, few coders start out with the bad case, the bad-case (coding to a instance/customer) are those 'quick-fixes' that accumulate over the years.

The most important thing is that you communicate right now.

From the HQ perspective they make a lot of money with very few developers and all seems to be going well, with no problems at all. Judging by the spreadsheets this looks great!

Your task is now to explain to them the risks involved with proceeding forward. You can also present them a plan to mitigate that risk without interrupting ongoing operations too much and slap some money figure on it — ideally you present them three options where one of those is doing nothing. Be aware that the decision on this is not yours, it is theirs. Your task is to tell them everything relevant for that decision. You can also tell them, that your professional opinion is that this is something that ahould have been done years ago and the fact that this didn't explode in their faces yet was pure luck. But again it is their decision.

How you lay it out depends on you, but there have been many tips already. Version control might be the first thing. Maybe you can present it as: one day a week goes towards maintenance or something.

As an aside this helps to cover your own behind if nothing is done and everything goes south in a year. Then you can point to that extensive risk analysis you presented them with and tell them you told them so.

If management aren't on board, and have an aggressive road map, and it's pulling in $20mil, it deserves to fail. Their on borrowed time, and likely they're the reason for the current situation. Run like the wind.
I've been there in the last 2 years, and we went for massive spaghetti ball to about 50% rewrite and 60% under tests.

I second most comments against the "full rewrite" here:

- source control it

- get a local environment if you can

- write tests before changing/deleting something

Adding tests can be hard at first. The book "Working Effectively With Legacy Code" by M Feather contains useful techniques and examples.

Be wary of the Chesteron's fence : "reforms should not be made until the reasoning behind the existing state of affairs is understood". Don't fix or remove something you don't understand.

Unless you have a good reason not to, I’d quit. It’s highly unlikely you’ll be able to change the culture without a ton of frustration. It’s just not worth it unless you enjoy that kind of challenge or are being well compensated.
> I know a full rewrite is necessary, but how to balance it?

No, re-write over time. There's an extremely high chance there is complexity you do not understand yet.

> - it has been developed for 12 years directly on production with no source control ( hello index-new_2021-test-john_v2.php )

First immediate win, start using source control. Initially people can operate in the same way they have been, just through git. Slowly but surely clean up the old files and show people how they are not lost, and how it cleans up the code. The switch to more advanced code management practices, like master branch vs working branches, code reviews, etc.

> - the routing is managed exclusively as rewrites in NGInX ( the NGInX config is around 10,000 lines )

Make sure this is definitely checked into git. Ideally you look to simplify this somewhat, you don't really want to be so heavily tied to the server.

> - the database structure is the same mess, no migrations, etc... When adding a column, because of the volume of data, they add a new table with a join.

A migration to a better database setup takes time. As long as there are no fires, treat it as a black box until you have time to fix it up. Just double check their backup strategy.

> - team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.

It sounds like you are new to their team. You need to win hearts and minds. One small thing at a time.

> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight.

Explain to them the code is like an old house. It has had lots of investment over the years and you have generated a lot of profit from it. The problem is, over the years, the foundations have crumbled, and despite the walls looks nice, they are painted over serious cracks. Whilst you could continue to use it how it is, one day it will simply fall down - unless time is invested today to maintain it.

They will then say "well, what needs to be done?". And you need quite a concise and well thought out way to respond to that question.

I’ve been in this exact situation, and honestly - just move on.

I mean, if you see this a fantastic opportunity to grow or whatever then fine, have at it.

However, you’re going to be fighting a two-front battle, both against the devs and against management, for widely different reasons. It’s going to take a toll on you.

Ask yourself if you really want to spend the next few years doing work you probably won’t see any recognition for.

How big is this code base and how advanced are the features? With only 3 juniors behind the wheel is it really that big? Was it always this small or is this the leftover maintenance team?

Is there documentation, requirements or user stories available for the existing features? Is it B2B or B2C? If it's B2B it becomes a lot easier to do customer survey of what is actually used and could help you remove half of the 12 year legacy.

Apart from the lack of source control, the rest of the issues, while being far from best practices, honestly don't sound extremely bad. Lack of framework or DI is not an antipattern in itself, even if it of course can be. Productivity of 3 juniors, split across one stack each, doing both operations and feature development on such a big application is going to be small even if using better practices. If revenue really is 20M and this code is critical, it sounds like you are understaffed.

Skipping the scm, deployment and process improvements, as others already gave good suggestions. Assuming you need to keep the existing code. One thing that has not been mentioned in static analysis. If the majority of the rats nest is in PHP, one thing you should do to add static type checking. This has zero effect on production and makes the code infinitely easier to navigate. This will expose how much of the code that is dead, how much is shared, what depends on what, etc. From here, refactoring will be a lot easier and safer. As others suggested you obviously need tests around it as well.

Great rewrites are alomst always a bad idea. I took part in flushing 2 man-years early in my career and more recently 4 man-years. Both were rewrites that couldn't get to shore. Painful both times, moreso the 2nd time since I was one of the implementors.

It's almost always better to do small replacements. Peel the onion so-to-speak. Refactor from within first to make a migration plan away from the crufty tech possible.

First and foremost: make a plan and sell it to the devs. If you don't get buy-in from them, nothing will change.

Good luck.

What percentage of the $20M are you getting per year? If it's less than double digits then you should run as fast as you can and look for more fun and rewarding work.
You should absolutely quit and work somewhere else. You're not going to learn many useful things, at best you'll have a horrible time, not improve the company's bottom line, so they won't care ane you won't be rewarded.

It could be much worse. You could break something and cost the company money.

First thing: talk with both the management or whoever non-technical you need to report to, and explain the situation and negotiate timeframe. Then talk to the techies too and kindly tell that things need to change immediately.

After you've got a working time window for getting things right, prepare a workflow that should take half the time you've discussed, as it will probably take twice the time than anticipated. (if you've negotiated on 3 months of fixing the mess, assume you have only 1.5 months or even 1 month and prepare 1 month's worth of work)

Then I think the very first thing should be moving to Git (or other SVN), setup development/staging environment and using CI/CD.

After making 100% sure the environments are separated, start writing tests. Perhaps not hundreds or thousands at this stage, but ones that catch up the critical/big failures at least.

After it start moving to a dependency manager and resolving multiple-version conflicts in the process.

Then find the most repeated parts of the code and start refactoring them.

As you have more time you can start organizing code more and more.

It sucks but it's not something that can't be fixed.

Also finally, given the work environment before you came, it might be a good idea to block pushes to the master/production branch and only accept it through PRs with all the tests requiring to pass, to prevent breaking anything in production.

I won't repeat what others said but you have no idea how common it is for a newbie to come onboard a new team and think "everything is wrong, it needs to be fixed and I will fix it". No doubt the things you said are things that can be improved but they were done that way for a reason and you already mentioned budgetary constraints, so it sounds like the existing team made the best of what they were given and you will have to become like them and adapt. Improve what you can as opportunities arise but migrating away from a 10k line nginx to PHP might sound like a good idea for example, but, you will spend resources on something that won't make a noticable difference? Instead, you can implement the PHP router for new endpoints for example.

The worsr thing you can do is come in with that attitude and expect the team to be onboard. You will only alienate yourself, try to understand why things were done the way they were (never architected but put together piece by piece over time). Make them feel heard and and pace yourself with any changes.

s.gif
> The worsr thing you can do is come in with that attitude and expect the team to be onboard. You will only alienate yourself, try to understand why things were done the way they were (never architected but put together piece by piece over time). Make them feel heard and and pace yourself with any changes.

Yep, I agree 100%, the last thing you want to do in this situation is piss off the three people who know how this thing actually works.

IMO the real thing OP needs to decide is whether he's willing to fix the whole thing himself or if he wants (needs) the existing devs to help. If he wants to go it alone then he can take any of the advice given here and do whatever he wants. But, if he wants the team to help then his main priority is to understand their current processes and how they get things done, and then look at where more modern practices can be introduced to improve things for the team and get them to buy in.

Sure he said their "resistance to change is huge", but mine would be too if someone joined my team and determined literally _everything_ needs to be changed immediately (even if it's completely true). I would bet they would be much more receptive to realistic suggestions after you get an understanding of their process, gradually building towards a better one. And if they're not, then OP should probably just go look for another team/job. It seems pretty clear that 'actual' management doesn't care about this (which is to be expected, I mean they apparently have a functioning product bringing in 20m) so as much as it sucks the situation is what it is.

I think something a lot of commenters are missing is that people who have worked like this for a long time are often massively resistant to using source control, even after having it explained to them.

Even getting that process to stick properly ("Step 1") will be a challenge, never mind resolving the other 10 complaints in OP's list.

I have been in similar situations.

Code can be fixed, but people sometimes can't be. You need to break down the "resistance to change" somehow. Trying to convince people can burn a lot of time and effort on its own. If you can't easily convince them, and you can't overrule them to dictate the direction, don't even bother.

You need people and you need budget. The business doesn't understand bad code, but you should find a way to make them feel fear. They have been drinking poison for years without feeling ill. Make them understand how easily the house of cards could come crashing down.

About this line: > This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight.

You should quit, there is no solution to that.

Get out. I’ve been there and tried in earnest fixing things. Without management’s understanding and an “aggressive roadmap”, you’re doomed to fail and burn yourself out in the process.

It got this way exactly because management doesn’t see the point or the problem. The fix isn’t technical (not yet), it’s cultural and strategic first which isn’t something you have control over.

I love this! Others have pointed out already how to untangle the mess, so I'd like to point out one important fact. You can generate 20M in revenue with absolute shit Code. Yet a lot of devs over engineer software that never generates a single dollar in revenue. Sure, there's not just black and white. It's definitely a nightmare. But I think a lot of code is "too good". Always make sure that the code paths you drive to perfection are actually generating value for the user.
> Resistance to change is huge

I wonder what you proposed, how you proposed it, and to whom?

If it's to the business unit I'd go with stuff like "If we make a mistake and it brings the site down it hurts income, so we should have source control and dependency management, automate deployment…" etc. They think their ideas will make more money than yours and they won't be reasonable about things they don't understand. Everyone understands big screw ups and websites that are down.

Once you have that, you can kill two birds with on stone by documenting all the APIs using using integration tests. Use the same fear of destroying income argument.

Once you know the APIs you can chop things into pieces and improve code and put boundaries around tasks. You can start to cache things because you know what the API behind it expects. Then you can build new APIs with adapters behind the cache and slowly introduce them.

You can build the stuff the business unit wants.

If you can't excite your developers with the possibility to design and build new APIs like that, then:

a) you need to brush up on your "soft" skills

b) you need to move on or ask for more money/perks

What is your true objective? What will you be evaluated on?

Focus and think of any other improvement you could do.

It sounds like management doesn’t think there is an actual problem to solve, so I wouldn’t necessarily pick refactoring or rewrite as the hill to die on.

If you go the refactoring route, i have little advice:

0. Clean up the database, it will immediately impact performance and make management happy

1. Find vertical (feature-wise) or horizontal (layer-wise) architectural boundaries and split the code base into module, separated libraries. This will be an ongoing process for a long while. Do it by touching as little code as possible - this is pure scaffolding, actual refactoring comes later.

2. Stick with PHP, at least until results from #1 aren’t good enough.

3. Use testing as a tool to pressure management, it works a surprisingly large number of times

4. Rewrite one feature/page at a time, once results from #1 indicate a good candidate. It might be a good idea to introduce a new language at this point, or even some form of micro services (if it makes sense).

Top of mind: - Teach the team vcs 101, put the code under vcs; do trunk-based - ask about critical places in code and add logging - implement dead simple feature/experiment toggles - set up an example: develop all new changes as simple functions using TDD - put yourself in „harm’s way”. Do a live coding stream where you show your team how you do it. - If they like it, offer to do pair programming sessions. - add composer and start moving dependencies there, one by one - repeat 100x

That’ll be 200$ lol

It's not clear from your post what your role really is (is it something like lead dev? or just a more opinionated member of the team?) but if you're not managing the team directly, then don't manage them. It's not your job and no-one likes that. If they wanted to make you the manager, they would have. And they didn't.

There's really only way to help improve a codebase / development process in a situation like this: one small incremental step after another, for a very very very long time. If you don't think you can enjoy that and have the patience to stay with the problem for a few years, consider looking for another job.

I think this can only be solved by some kind of consultant, who is not part of the company and can talk frankly about the issues here. As programmers we see everything as a technical problem, but in your case it's more than that. It's unwillingness to change and blindness to the actual situation.
Haven’t seen anyone here mention instrumentation. Once you get source control set up, I would lean hard into metrics and observability, so you can easily identify and eliminate dead code, and also figure out what’s the most important.

Same for the DB - instrument your queries, figure out what your most important queries are.

Hire a senior engineer. They would probably 1) put the whole thing into version control, 2) streamline the deployment process so that specific versions can be pushed easily into a sandbox or production environment, 3) begin writing tests, starting with end-end and system tests in this case, and hook them into a continuous test harness 4) do an architectural review to identify major components that can be split off from the monolith in order to reduce the surface area to work off, start with the part that you will get the most bang for your buck, a business critical area for improvement. 5) add unit tests and integration tests for this and the main component. 6) Repeat as necessary.

I don’t believe this is a problem that can be solved with people skills alone. It requires senior technical expertise.

PHP is great in a sense that you can easily combine legacy and modern code in the same codebase. Just do a new_index.php for all new stuff. I'd start building new features and features that are in active development in the 'modern way', and just keep the legacy code as is. When the new way has been established, and the team has accustomed to it, it becomes easier gradually rewrite old parts, when necessary. You might find out that lots of the old code doesn't need to be rewritten, but can be managed as a legacy part of the app which means mostly frozen but working code.

You should also understand the audience. Who are the users of the app? It sounds like that the app does not need high reliability or availability, or any of the stuff that's required for typical mass market web apps. Understanding this might give you some room to improvise.

The thing makes over $20m a year, has only 3 junior folks for support / maintenance / development (with junior salaries presumably), and budget is tight?

Run. The problem here is empathically not on the technical side.

Bad code doesn’t matter. It’s clearly making money in spite of the bad code. But engineer efficiency matters, especially if it’s blocking the future. So your goal should not be to rewrite anything, but to find ways to increase productivity. You’ll find that when viewed through that lens, some parts of the code and processes will have to change but other parts simply don’t need to be changed (even if it makes you cringe). The better bit though is that you’ll be more aligned with the business, whereas rewriting for the sake of improving bad code is not aligned with the business.
Try to "fix", or at least go to the bottom of people issues first.

For example, not having version control was already unacceptable 12 years ago. Someone on the team must be strongly opposed to it. Find why. If no-one is against it, just set it up yourself. If it's management, and you know it's not going to change, find some other management to work for.

Rince and repeat for all the low hanging fruits.

After that... Good luck.

I think you already gave the answer to your self, but didn't realize. You have two options:

1.) Leave this mess behind you and quit - and miss an opportunity to learn a lot about code, yourself, teamwork and solving real world problems

2.) Work together with your team and solve problems, that probably will improve your skills more than anything in your future

I recommend you to give 2.) at least 6 months, before your quit.

What I would recommend:

- Create a git repository (I would not init it on the production server, but copy the code over to your machine, init, experiment a bit, and if you found a reliable way, repeat this process on the server)

- For the first weeks, continue developing on the server with one main branch, but at least push it to a central repository, so that you have a kind of VCS

- Setup a dev system, that points to a cloned (maybe stripped down) prod database, where you can test things

- Add composer in dev and see, if you manage to migrate this to production

- As you said, you already have an API, that is called via curl. That might be the way out of your mess. Create a new API namespace / directory in the old code base, that is fully under version control, uses composer and as little of the OLD mess of code as possible (you won't get out of this with a full rewrite). Write unit tests, wherever possible.

- I recommend to use jsonrpc in your situation, because it is more flexible than CRUD / REST, but this is up to you

- Get SonarQube up and running for the new API and manage your code quality improvement

- New features go to the new API, if possible

- Start to move old features to the new API, create branches and deploy only ONE folder from dev to prod: the api directory

- The database mess is a problem, that you should not solve too early...

This should take roughly a year. Have fun ;)

Thank you for so many suggestions. The main issue is productivity within a context where the company is trying to reinvent itself in terms of marketing and business model. This has for consequence that many new big features are being requested and promised by management to headquarters. But in the last years, all bug evolutions have been failures. That's why I've been asked to intervene. I love the idea of the strangler pattern associated with big unit testing coverage.
s.gif
> This has for consequence that many new big features are being requested and promised by management to headquarters. But in the last years, all bug evolutions have been failures. That's why I've been asked to intervene. I love the idea of the strangler pattern associated with big unit testing coverage.

The first thing that needs to be strangled is unachievable management promises. Figure out how to get local management to not write checks the tiny team can't cash. A team of 3 juniors will likely overestimate their ability to deliver, so you've probably got to teach them to say no to things they can't do also.

s.gif
It's pretty risky to delete code you don't understand. Is it harming you if you don't refactor?
> team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.

I did not get this, if it is three people who are juniors, how do they resist any changes.

Since it is only three, could you get to hire someone senior and start untangling?

Doesn’t sound like the worst code really; sounds like the average older php codebase I run into. I maintain (inherited) products that are over 15 years old and I find it enjoyable. I would be able to slowly move this thing to modern standards without rewriting or breaking anything; been doing that on very large php projects for a decade. Probably doesn’t need a rewrite, just see it as a bonsai tree.
Talk with the team. Hear their pain points and propose source control and a test and dev environment.

If you are not managing them directly and they don't want to do those kind of things because it sounds hard or foreign, then you can't really do anything about it.

Quite a bit of replies with various good suggestions in a short span of time. However, I could not understand what exactly is the problem you are trying to solve?

A $20m/year is pretty impressive with that kind of spaghetti code/tech amalgamation. It would be certainly a fun project for your more junior developers to dig into it and understand the actively used features. That raises my next question: what exactly is wrong with your 3-people development team? Are you expecting only 3 of them to make major changes, let alone a full rewrite for such a project?

The way I see it is that you only have enough development resources to make minor changes or features that fit in the project's current spaghetti framework. Is that what management wants? If they want some big new features your only option is to find path of least resistance to implement them, especially if your budget is tight. Basically, add more hack-fixes and continue feeding the monstrous legacy. Unless you get more people, more budget you don't really have a choice of doing things "the proper way".

Two things stand out.

That’s a very profitable business off 3 junior devs so there is money for more, senior people.

The junior devs can’t possible like working like this - it will be through necessity and fear they push back on you. Ask them what they think could be done to improve things and start there. Remove the fear of change.

If they are just being protective and won’t accommodate change then replace the most influential one with someone more senior once the team can cope with the loss.

I think you have to be prepared to wait 4 years to build up the political capital to even suggest some of the massive changes you’d be advocating.

Or at least, have the team listen.

Sounds like Canada Computers. I'd start by introducing source control and tests. Those are low cost with a high impact on stability.
If you don't manage them directly that sounds like you don't have authority.

Without the authority to make changes you this will be very hard to do, given the scope of changes required. Soft skills and influence works up to a point but given your remarks about resistance to change this is a big challenge.

You need to ask for the proper remit and authority, or decline and move onto another project or job.

I know the typical piece of advice is to never rewrite but I am going through a similar situation and a rewrite would probably have been better and simpler.

The key is though, you don’t rewrite the code, you rewrite the app. Figure out what the functional pieces of the app and what it’s supposed to do. Don’t use any ActiveRecord style ORMs, so Laravel is out. If the app is that bad then SQL database is probably a huge mess. If it had to be PHP, use Symfony and Doctrine.

Build an MVC version of the application.

If there was any sort of structure to the application then the refactor not rewrite approach would be correct but if it’s anything like what I think it is, it’s a fucking mess. Refactoring will just make a bigger mess.

If you can get away with refactoring pieces at a time into symfony components until you can eventually have an MVC framework then do it but likely that would be a much bigger task.

Three engineers have built and supported a codebase generating 20 MM in revenue a year. Maybe get off your high horse, it’s really easy to mistake a complex order for chaos from the sidelines.

Rewrites are almost never the answer unless you wrote the previous version. Sure, to most of us here the code you’re describing might look like garbage, but it works and certainly a ton of wisdom has been embedded into it that will be difficult to replicate and understand unless you dive into what exists now and try to work with it on its terms for a little while.

I did a major rewrite early in my career based on something someone else built, and it was a total disaster for a while. I thought I knew better as an outsider looking in, and sure, eventually we did improve things, but a lot of my choices were not best practices, but some form of fashion.

s.gif
No source control? There's "it works" and then there's "they got lucky enough to not implode yet".
s.gif
The NGINX config description actually had me laughing out loud. That one seems particularly heinous.
Start with writing integration tests. Worry about touching the code only after you have a full test harness. Using an external tool like Playwright, Cypress, or Selenium you can write the tests in a language of your choice without touching the code.

Deploy the code into a staging environment (make a copy of prod). Kubernetes might be useful to try to package the application in a replicable manner. Then get the tests running on CI.

When the tests cover literally everything the app can do, and everything (tests/deployment) are running on CI, changing the app becomes very easy.

Your junior coders no doubt have been yelled at many times for attempting changes and failing. When they begin to understand that change with breakage is possible, their confidence will increase, and they will become better coders.

Resist the urge to change the application at all until you have tests.

> - team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.

With good reasons I might say :).

I've not yet seen any comments along the lines of ask your team what they think is wrong and what can be improved
Quit. Not because I think it's too big of a mess to fix, but because you don't fit. You won't change them and they won't change you.
Rewrite is rarely the best approach. The best approach is to refactor it by order of importance. Break it up and go from there. If it's not broken, don't fix it.

The best approach is to:

-Assess the situation

-Create a task list

-Decide what needs immediate attention

-Create a time line for it all

-Get feedback from team

-Add the business roadmap to you list

-With upper management work on a timeline

-Define your project with realistic times

Execute and manage the project.

It took 12 years to get to this point so don't expect to change it overnight.

BTW, this type of team and codebase is not out of the ordinary. Companies start to program with the idea that eventually the problems will be fixed yet it never happens. Upper management does not care because all they care about is reducing cost and getting the results they need. You're dealing with the results.

s.gif
The problem here is normalization of deviance across the entire organization. That has nothing to do with tasks.

The task #0 is sit down with the team and ask them to say in their own words what do they think about the project, about their engineering good practices, etc. See how aware are they about the problem they have created.

Try to understand how the status quo became normal and acceptable, before the same thing happens to you.

If this shit happened in the first place was because likely everyone was too busy living in their Jira alternate reality where you benefit from the perverse incentives made possible by the lack of visibility on code quality.

s.gif
Good point. It's hard (impossible?) to fix the problem if you don't fix what caused it.
The challenge is quite interesting IMHO but the fact that the budget is tight and the team is a tiny mess of junior people with "resistance to change" makes me see the burnout would be around the corner.
i would suggest you take some money and hire someone from the outside to tell management and the devs exactly that. They will listen because it is coming from the outside. Then after that you can decide whether to change something or not, Does not matter what. personally i would also go for the low hanging fruits first: - backups of the database - git push all the code which lives on production

rewrites are super dangerous, double if the team is junior then they would need to double all the features, migrate, develop the skills they lack now, otherwise same mess in the end with a new framework and so on.

1. Find out the "real version" of the code.

2. Find out the "real version" of the sql schema.

3. Make some method of running this code + nginx config locally.

4. Add a test framework which simulates real traffic you see on the app and make sure the DB contains the right thing.

5. Make a staging environment which mirrors traffic from prod and run changes there for ~1 week and manually audit things to make sure it looks right. (You'll only do those until you feel safe)

Now you can feel safe chanting things! You can tackle problems as they come in. Focus 10% of the time of devs in new features. Focus 90% on reducing tech debt.

Lots of dead code? Tackle that.

Package management hard? Migrate to composer.

Don't do everything up front. Just make sure you have a way to change the codes, test those changes, then push them to prod / staging.

20 years old, no source control and 20 million in revenue? What is this? Maybe I can make a competitor

I'm guessing this is a medical billing system of some sort, lol

Fire the front/backend developers and hire the best php full stack you can. Start a new php framework project and move code over. 2003 php code is straightforward and easy to port. If you try to move to a new cool language you will lose whatever is special now.
Maybe pessimistic: leave and find a place to work on something you enjoy. If you stay you will be left fighting everyone and doing all the grunt work by yourself whilst others keep pushing more code in for you to cleanup.
> And post COVID, budget is really tight.

> this code generates more than 20 million dollars a year of revenue

Budget is probably not as tight as you think

s.gif
Revenue is quite different from profit.
Build a deployment server and a dev server. You can do this without the team knowing.

Do a swot analysis with the team. Make them answer why it takes days to do simple changes. Make them answer how they'd recover prod if the disks died.

Block access to prod. The team has to code on Dev and upload their artifact to cicd.

They'll hate the change but it's policy and it's enforced. What are they going to do?

Block artifact upload to deployment. They have to merge a branch instead. Be extremely available to help them learn the SCM tool.

They'll hate the change but policy, etc.

Set up a work tracker that lets you link bugs to features. Populate it with historic data. Triage it extensively. Show the team how each bug comes from an earlier change. Show the team git bisect. (You'll need a test server at this point.)

Set them a target: average time per feature or issue. You'll abolish this metric once it's attained for the first time. In the meantime, it's hard to game the metric, because the codebase is fucked.

Wait, and see if they come up with anything on their own - dinner is cooked when it starts making interesting thoughts.

If they fail to work it out, you'll need to coach them. Give them little breadcrumbs.

You want them to understand:

- slow delivery == poor business outcomes - bugs == poor business outcomes - git helps with bugs - cicd lets you write code - testing reduces (delivery time + bugfix time)

Only when the team understands this can they do the work of fixing the app. (IMO that's a total rewrite, but you're not short of advice ITT.)

My experience with dumpster fire legacy systems is to at least ensure you have proper backups in place so you can roll back if the worst happens.
The place to start is version control, a dev environment and a CI process. Only then can you start to tackle the code. Priority number one is to have some control over the whole thing.
You won't be able to fix it unless you get business to see it as a blocker. So that's the first task.

Second task is to come up with a plan to your refactor. Break it down with time estimates, etc.

First of all start by preserving the application state to the version control.

Then start thinking about replicating deployment of the application. At start it can be a script that compresses and extracts the files to the production environment. This will benefit you to build similar environments or other more experimental development environments.

Once you have a flow of the application state management and deployment under control, you can start building on top of it.

The most valuable work would be to build a separate test suite that documents the most mission critical parts of code or application.

Only after this I would try to reason the changes to the application. The great part is that you have Nginx configuration as an abstraction layer. From there you dissect the application to smaller pieces and replace the application one redirection call at time.

If the application has an expected lifetime of over 2 years. Then these changes will pay themselves back as faster development cycles and maintainability of workers and codebase. This can be a selling point to management for roadmap or recruitment.

Good luck.

This sounds like a dream to fix.

I'd start by small incremental changes. A big change will be resisted.

Deployments first, separate environment next etc

> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight.

Quit.

Find a job where management has half a clue and is reasonable.

I've been in quite a similar situation, in which we tried to redo everything without proper knowledge. That was a catastrophe. Please, don't promise a panacea to the managers - better code is not more profitable, necessarily. New mistakes will be made (inevitably) and old stuff may become unstable. Also, it's not uncommon that developers bail out due to pressure. Have your team prepared for baby-steps. * Before doing any actual work, I'd suggest everybody reading Clean Code and Clean Architecture. You'll have a better understanding of SOLID principles by then. * Start by adding version control and a separate environment for development / testing. * Try refactoring the least important things first. If they crash, it won't be so critical. The most complex modules will end up with more quality.
Lots of great advice in this thread, heed it.

I would just add -- embrace the challenge. It actually sounds like a fun problem. After many years in tech, I've learned that I'd rather work on improving a pile of shit codebase that produces a lot of value than a pristine perfect codebase that does not.

Let me rearrange some of your points:

> - it runs on PHP

> - it doesn't use composer or any dependency management. It's all require_once.

Great --- explicit dependencies are better than magic. Personally, I'm a fan of require rather than require_once, because of some history, but require_once is mostly fine.

> - it doesn't use any framework

> - no MVC pattern of course, or whatever pattern. No templating library. It's PHP 2003 style.

This is the proper way to run PHP. Can you imagine if they used frameworks? It'd be a slow mess, with about 70 different frameworks. At least this is likely a bare metal, fast mess.

> - this code generates more than 20 million dollars a year of revenue

> - team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.

So you've got 3 junior people managing 20M of revenue

> - productivity is abysmal which is understandable. The mess is just too huge to be able to build anything.

> I have to find a strategy to fix this development team without managing them directly.

> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight.

HQ doesn't understand the process, can't even budget a manger, because apparently it's not your job to manage them. I'd bet their requirements are unclear and poorly communicated too.

> - the routing is managed exclusively as rewrites in NGInX ( the NGInX config is around 10,000 lines )

Great, the routing is one place!

> - no caching ( but there is memcached but only used for sessions ...)

Do you actually need caching? You didn't say anything about the performance, so I'm guessing not.

> - In many places I see controllers like files making curl requests to its own rest API (via domain name, not localhost) doing oauth authorizations, etc... Just to get the menu items or list of products...

Curl to the same server port is a bad pattern; yeah. Localhost or domain name doesn't make it better or worse. Figure out how to make those a call to a backend service maybe? Are you also saying this is running on a single machine (I think you are, but you didn't mention it)

> - it has been developed for 12 years directly on production with no source control ( hello index-new_2021-test-john_v2.php )

Ok, check in what you have, and make a deployment procedure that doesn't suck, and set things up so you have to use the deployment procedure.

> - no code has ever been deleted. Things are just added . I gather the reason for that is because it was developed on production directly and deleting things is too risky.

If you can, run profiling on the production site to see what code appears to be dead code, and run down the list.

> - the database structure is the same mess, no migrations, etc... When adding a column, because of the volume of data, they add a new table with a join.

Depending on the size and volume of the database and the operational requirements, this is kind of what you need to do. Do you have anyone with operational database experience who could help them consolidate tables, if that's what's really required? Is the database a bottleneck? You didn't say that, you just said you didn't like it. There's ways to add columns and migrate data, but it requires either downtime or a flexible replication system and some know-how. Consolidating the tables without at least write downtime is going to be a lot more challenging than if they had the opportunity to add columns at the right time... of course, sometimes having tables with a join is the right thing to do anyway.

Is there budget for a staging system, complete with enough database instances to test a data migration and time to do it? Maybe focus on developing a plan for future column additions rather than trying to clean up the current mess.

> - JS and CSS is the same. Multiple versions of jQuery fighting each other depending on which page you are or even on the same page.

jQuery is pretty compatible right? You can make a list of all the pages and all the versions and maybe make time to test updating the pages with the oldest versions to newer versions, etc. Again, a staging system would help with testing. Developing a testing plan and running the tests is something that doesn't require much from the three overworked developers, but could be offloaded to a manager.

s.gif
I do like it when someone else saves me a lot of typing. Very much agree with all this.

(Obviously the real problems are political, but ignoring that...)

Seems to me that after it's in source control and a dev/staging system exists, the next step is to add in a data access layer - move all the raw SQL etc out of the main codebase into either new PHP code or a web service. Then add a bunch of logging so it's possible to discover what parts of the system actually get used. The data layer can then get useful test coverage, allowing the DB to be safely rearranged. The next step is to treat the rest of the PHP app as a black box and write tests around it with something like Selenium, and the work of replacing it with some other boring but more modern technology bit by bit can begin.

I'd view it as 12 years of experience and data gathering upon which to build a really stellar rebuild with all the lessons learned ;)
Don't. Just leave. To each, their own mess.
Are you looking to hire at all? Seems like a worthy challenge.

There have been some good great replies here and I agree step one is version control

No you don't need a full rewrite. You said it already: this code generates more than 20 million dollars a year of revenue (reality: it's probably not just the code generating that revenue).

You need to introduce things bit by bit to convince the team. Start with version control.

Seeing a lot of people say that a rewrite is a terrible idea, but (as someone who doesn’t understand why) I’d love to hear a more fleshed out explanation re: why exactly that’d be a bad idea.
s.gif
Rewrites frequently fail or go massively over budget/schedule.

It can be difficult to fully replicate the existing system and there are frequently important but subtle reasons why the existing system has the architecture it does.

To the extent that one can make modular changes and address the most-important pain-points, one probably should.

Sometimes a complete rewrite is a better choice, but if embarking on that path, a fail-fast attempt at an MVP might be the right style to do so. If the MVP crushes the existing system in performance/benefits, then subsequent iterative development may yield a viable re-written replacement system.

s.gif
having seen several rewrite attempts in my career, none of which were fully successful, here are some thoughts:

- The ultimate reason: it will take too long and be over budget. The business will (rightfully) ask why should they invest x amount of capital just to get essentially the same feature set back. Businesses do not care about whats under the hood.

And here is why: - the rewriter team usually does not fully understand edge/corner cases that the current mess handles, but obscures it.

- the rewrite inevitably ends up following the same patters that the original did leading to unusual/weird cases

- rewrite teams get too ambitious and attempt to over abstract and over engineer, eventually creating another mess understood by only them

s.gif
A rewrite is likely to take a lot of time, and if you're not careful, it's easy to end up running two systems rather than one at the end: the new one that doesn't quite do everything, and the old system that still does some important things.

In addition, if you don't change the development conditions, you're likely to end up with a similar mess at the end. Sometimes, code is messy because you didn't know what you were doing when you started and a rewrite could help; but sometimes code is messy because the requirements are messy and change rather a lot --- a rewrite can't help much with that.

That doesn't mean never do a rewrite, but you've got to have a pretty good reason, and it sure helps to have an incremental plan so that you don't end up with two systems and so that you start seeing the fruits of your labor quickly.

s.gif
doesn't the strangler pattern assume it's already modularized within the monorepo?
s.gif
Not if you treat the application itself as the module to replace. Essentially your "new app" starts out as an http proxy in front of the old one, and then you implement routes/modules in the new one one by one and replace the proxy w/ the new logic as it comes online.
Revenue is $20 million annually, there are 3 developers and budget is tight? Can we drill into this?
so, what's the problem of the code base business-wise?
s.gif
"Business team have an aggressive road map" and "productivity is abysmally low".
s.gif
there is no logical connection from "aggressive road map" to a messy code base. there are lots of ways to solve the problem of "productivity is absymally low", e.g. training, coding guidelines, given the team are all juniors. without an objective analysis, the problem you can see is always the phenomena, not the root. it appears to me more like an internal power struggle OP wants to win, than a real tech problem he/she wants to address.
So, I do a have a war story about something like this, possibly at a worst state. And possibly with somewhat higher stakes (around 400m$/year) at the time. I came in as a consultant with my own "parallel implementation team". In my case I was somewhat lucky because most of the system was composed of batch jobs. They did have "frameworks" with "ORMs" but they had 4 or 5 of them, with many files being pinned to some older version. Which meant that actually there were dozens.

There were thousands and thousands of business rules no one knew why they were there and if they were still relevant. I remember one fondly. If product=="kettle" and color=="blue" and volume=="1l" then volume=1.5l... This rule like many others would run on the millions of product lines they would import daily. And the cutest thing in the system was that if any single exception happened during a batch run... the whole run would fail. And every run would take close to 15 hours (sometimes more).

Not going into details ... But they couldn't afford the run going over 24 hours... And every day they were inching closer.

Similar to OP they extensively used EAV + "detail tables" to be able to add "things" to the database.

The web application itself was similar but less of a time-bomb. It was using some proprietary search engine that was responsible for structuring much of the interaction (a lot of it was drill-down in categories).

Any change on the system had to happen live with no downtime. Every minute of downtime was $1,000 in lost revenue.

The assumptions we had were: 1. At some point the system will catastrophically fail so 100% of the revenu will be lost for a long time. 2. Even if it were possible to rewrite the system to the same specs (which it wasn't because no one knew what the system actually did) such a rewrite would probably be delivered after the catastrophe.

The approach we used was to 1. Instrument the code - see what was used what wasn't. We set some thresholds - and we explained to the stakeholders they were going to be potentially be losing revenue/functionality. And we started NoOping PHP files like crazy. Remember, whatever they did the worse thing they could do is raise 2. Transform all batch jobs to async workers (we initially kept the logic the same) - but this allowed us with 1# to group things by frequency. 3. Rewrite the most frequent jobs in a different language (we chose Ruby) to make sure no debt could be carried over. NoOp the old code. 4. Proxy all http traffic and group coherent things together with front controllers that actually had 4 layers "unclean external" - whatever mess we got from the outside. "clean internal" which was the new implementation. "clean external" and "unclean internal" which would do whatever hacks needed to recreate side effects that were actually necessary. The simple mandate was that whenever someone did any change to frontend code they needed to move the implementation to "clean external". 5. We ported over the most crucial, structuring parts to Ruby as independent services (not really micto-services just reasonable well structured chunks that were sufficiently self-contained). If I remember correctly this was something of the size of "User" and "Catalog browser" the other things stayed as PHP scripts. 6. And with savagery any time we got the usage levels of anything low enough.. we'd NoOp them.

Around a year in there was still a huge mess of PHP around but most of it was no longer doing any critical business functions. Most of the traffic was going through the new clean interfaces that had unit tests, documentation etc. I think that 100% of the "write path" was ported over to Ruby. A lot of reports (all of them?) and some pages were still in PHP.

I don't think anyone ever noticed all the functionality that went away. We had time to replace the search engine with Elastic Search. It wasn't clean by any means but it was sturdy enough not to have catastrophes.

The company was bought by some corp around that time... and they transitioned the whole thing to a SaaS solution. I was no longer involved for quite awhile so I only heard about it later. But we bought them that extra year or more.

So .. as far as recommendations go: 1. Instrument the code (backfire.io !) 2. Find bang for the buck and some reasonable layer separation and do it chunk by chunk. 3. Don't try to reproduce everything you have. Go for major use-cases 4. Communicate clearly that this is coming with functionality loss. 5. Be emotionally ready for this being a long long journey.

GTFO. Can't be saved and rewrite is too expensive.
You have to fire them. Hire replacements first, then fire them. No matter what you do they will think they are smarter than you because they built a successful product. They haven't seen anything other way of doing anything. If you don't get rid of these idiots you will never be able to fix anything, they are going to go around you to management and blame every single issue that comes uo on you and your changes, and eventually you are the one that is getting fired.
You don't have a software problem, you have a people problem.

You have some directly measurable consequences of the underlying issues, as well as some obvious risks that are generally being ignored. Start with those:

1. Productivity is abysmal. Measure the time to implement a feature to get a feel for how long things actually take. How long does it take a feature from being requested by management to being released?

2. Unstated, but I'm guessing that release quality / safety is generally low. (due to lack of testing / staging / devops / source control). Measure this by looking at whatever system of bugs you can get (even if that's just show me all the emails about the system in the last year).

3. An aggressive roadmap. You're going to have to find some balance and negotiate this. If you happen to find a way to make the software better, but don't deliver any value to the business, you've failed. Learning this developer failure mode the hard way kinda sucks as it's usually terminal.

4. Resistance to change is huge. The team have so far been successful in delivering the software, and their best alternative to changing what they're doing for something else might just be to quit and do that something else somewhere else. What incentive do they have instead to change what they're doing here? This likely involves spending time and money on up-skilling. You've identified a bunch of areas that could be useful, now you've gotta work out how to make that change. E.g. actual time to attend paid courses during work hours on how and why to use git. You mentioned budget issues, but it's worth considering this old homily:

> CFO: "What happens if we spend money training our people and then they leave?"

> CEO: "What happens if we don't and they stay?"

5. You can see a bunch of risks, and the team knows them too. Right now, the team probably mitigates them informally with practices learnt from experience. (E.g. the add a new table with a join approach). Because the risks are adequately mitigated in their minds, there really isn't a problem. You're the problem for not seeing their mitigations. That said, by taking the approach of getting the team involved in risk planning, you may see them reevaluate those approaches and come to some opinions about what they need (i.e. source control, tests, devops, etc.)

6. Your people problem is such that you're going to have to convince the existing team to accept that they made mistakes. However you do that you're asking the team to reevaluate their output as a success and instead accept that they are failing. This might be the hardest part of any of this. To do so is going to take untangling the team's identity from their output. If you don't have the soft skills to do this, you'll need a mentor or stakeholder that can help you develop these. You will fail if you don't accept this.

7. Lastly, you're fighting against one of Einstein's quotes "We cannot solve our problems with the same thinking we used when we created them". Are you sure you can fix the problems created by the team, using only the members of the team? Unless you can change their thinking significantly, or add more people with different thinking (yourself and one more developer), then you're bound to fail.

I'd echo a bunch of jeremymcanally's comments below [1]

On the technical sides:

1. Buy each developer a copy of "Working Effectively with Legacy Code" by Michael Feathers [2]. Book club a couple of chapters a week. Allocate actual work time to read it and discuss. Buy them lunch if you have to. The ROI of $100 of food a week and several hours of study would be huge. Follow this up with "Release It!" by Michael Nygard [3].

2. Don't rewrite, use the strangler fig pattern [4] to rewrite in place. Others in this post have referred to this as Ship of Theseus, which is similar (but different enough). Spend some time finding some good youtube videos / other materials that go a bit deeper on this approach.

3. In the very short term, try to limit the amount of big changes you're bringing at once. Perhaps the most important thing to tackle is how each page hits the DB (i.e. stand up an API or service layer). If you try to change too many things at once, you end up with too many moving pieces. Once the impact of the first thing is really bedded in and accepted, you've earned enough trust to do more.

4. Stop looking at the symptoms as bad, instead always talk in terms of impact. By doing this you ensure that you're not jumping to a solution before examining whether the issue is as big as it seems, and you acknowledge that each suboptimal technology choice has real business level effect. E.g.:

- Lack of dependency management isn't bad, the problems it causes are the real issue (spaghetti code, highly coupled implementations, etc.). The business values predictability in implementation estimates.

- Lack of source control isn't bad, not being able to understand why a change was made is the real problem. The business values delivering the correct implementation to production.

- Lack of automated testing isn't bad, but spending time on non-repeatable tasks is a problem. The business values delivering bug free software in a reasonable time.

- Lack of caching isn't a problem, but users having to wait 30 seconds for some result might be (or might not if it's something done infrequently). The business values its users time as satisfied users sell more product.

[1]: https://news.ycombinator.com/item?id=32883823

[2]: https://www.oreilly.com/library/view/working-effectively-wit...

[3]: https://pragprog.com/titles/mnee2/release-it-second-edition/

[4]: https://martinfowler.com/bliki/StranglerFigApplication.html

Currently 15months into a similar situation. Successful product with year on year revenue growth. Key lessons learned:

- You need to get an understanding of why things are the way they are. Team of 3 people seems small. Is the team always in firefighting mode due to business constantly dropping things in their lap. - Do not attempt a full rewrite. Here be dragons & krakens. - One of the first things to do is to get your code into source control before you do anything else. That gives you insight into how often the code changes and in what way it changes. - The routing, templating, caching, curl requests, dependency management issues all stem from the no framework issue. - You are going to face varying levels of resistance. Part of that is going to be from the business side of things

My suggestions:

- You need to get management to understand the problems and on board with reform as soon as possible. Avoid framing the issues as technical problems. Explain the potential risks to bottom line resulting from business continuity failure or regulatory/compliance failure (esp if your industry is health/finance/insurance). If management is not onboard, your reforms are very likely going to be dead in the water. Might be best to cut your losses. - Get your code as is into git asap. - You will need more hands. At the very least, you need a senior who can help hammer things into a structured pattern that the juniors can follow. - Carrot is going to be much more effective for convincing your devs to adapt to new changes. Understand their pain points and make sure to frame things as not questioning their competence. The understanding needs to be that their time is valuable and should be spent on this that deliver the most value to them and to the business. - Business unit needs to rework their aggressive roadmap. I suspect there's an element of 'we always have delays in releasing so we need to keep the pressure up on developers to keep momentum up". You need some kind of process in place for managing roadmaps (We're currently working our way towards scrum across the business. It's difficult but persistence even in failure is important). - We've attempted rewrites of one of products. It took much longer than we planned (currently still in progress). What we're currently doing is using laravel as a front end to the legacy apps (laravel receives the request and passes it on to the legacy app in the same request) It is working well so far and has the advantage of allowing us to use laravel's tools (query builder, eloquent, views etc) in the legacy app. Then we can progressively modernize the legacy functionality and move it fully into laravel.

Also, remember to breathe and take a break now and then. Wishing you good luck. If you want to talk more or just vent, hit me up at voltageek [at] gmail.com.

Walk away.

Life is limited, do you want to spend 5 years of it here?

This is a tough tough situation. There are no easy answers or quick wins here. So before we even think about code, let's ask some questions...

1) You said you can't manage this team directly. Is it your responsibility to make this team successful? I know it's annoying to see a team with horrible code and who refuse to change. But is your manager expecting you personally to fix this? If not, just leave it.

2) Even if it's your responsibility, is this where you want to spend your time? As a leader you have limited time, energy and political capital. You need to decide strategically where to spend that time to have the best impact on your company and to achieve your personal career goals. The fact that you can't manage them directly makes me think that they're not your only job. If it's just one area of your responsibilities, I'd consider letting this team continue to fail and focus on other areas where you can make some wins.

3) Is how the business views this team wrong? They're making a lot of revenue with a very cheap team who seem to be very focussed on delivering results. Yes I know, it's annoying. They're doing everything wrong and their code is unimaginably dirty. But... They're making money, getting results and neither they nor the business see any problem. So again... should you just let it be?

4) Ok, so if you're absolutely committed that this code base has to be fixed... maybe you should just find a different job? Either in the same company or in a different company.

5) Ok, so it's your problem, you want to solve it and you're unwilling to leave. What do you do?

Well, anyone can make a list of ways to make the code better. Because this team has been doing everything perfectly wrong, it's not hard to find ways to improve: source control, automated testing, CI/CD, modern libraries, SOLID, clean architecture, etc, etc.

You can't quietly make the changes, because the team doesn't agree with you. And even if they did, this hot mess is way past the point of small fixes. You need to put in some solid work to fix it.

So you need buy in from management. You either need to deliver less while you improve the code base or spend more money on building a larger team. But since they see no problem, getting their buy in won't be easy.

Try to find allies, make a pitch, frame the problem in business terms so they understand. Focus on security risks and reputational risks. And don't give up. You may not convince them today, but if you make a pitch, they will remember in 6 months time, when this team is still floundering. They will remember that you were the person who had the answers. And then, they may come back and give you the time and resources you need to clean up the code base.

So in conclusion. If it's not your problem, ignore it. If you have other teams to manage that aren't a mess, focus on them and let this one fail. If you're going to be responsible for this pending disaster, quit. If you absolutely insist on making a change, start with getting buy in from management. Then incrementally work down the technical debt.

Just quit. It’s not worth saving these people if you’re not getting paid to.
You didn't state your position within this mess. Why is it your problem to fix? Hopefully you are being paid well, if not I might just move on immediately.
If you don't manage that team and "the thing" is working, you need to define what is the problem you are trying to solve and why. You are describing a software that is written in a naive or obsolete way, but other than doing development in production there is no critical problem that you can fix to bring immediate value.

I saw this in the past a few times. There is no universal recipe, if this is what you are looking for. Get some development and stage environment and make them use Git, that's a start. See what is the plan for that software, maybe the company does not want (you) to waste time and money with it, if they want to do something, discuss and align that.

In the end, if it works it brings value. If you want to rewrite it, it will bring some value and some cost: which is bigger and what is the priority, a rewrite or new features?

One more thing you can do it show the developers how to do some things in a better way, like composer or cleaning up versions and dependencies, but take it easy and present it to them in a way they will buy it and do it themselves, not because you told them so. Make them better and they will make the product better.

I wouldn't change the structure. Purify the codebase into pristine 2003 php, with a sane toolset. You'll learn all the quirks of the problem code as you do this.

When you've got a clean base, the team will be moving quicker, be more skilled with what they already are learning and listen to you. Then you can consider the structural changes.

Pure, clean 2003 php into a new format is way easier than spaghetti nightmare into total re-write.

If the code is so bad, how is it generating $20M a year?
When I started at my previous job as an IC, things looked similar - although they were at least using git already to share the code (deployments were made by uploading files to production anyway). The team was made up by a grumpy solo dev, an overly enthusiastic, hacker-type CTO, a very thoughtful but introverted engineering manager, and three junior devs. No tests, no migrations, secrets all over the place, no running locally, layers upon layers of hacks and required files, and a homegrown framework using obscure conventions (my pet peeve: the endpoint handler called was resolved dynamically by combining the request method and the URI part after /api/, so GET /api/foo/bar would call get_bar on the foo controller. As every method was public, this would also work for delete_internal_stuff).

What I did was forming a mental plan on how to get the org to a more sensible state - namely, having the application run on a framework, within a container, with tests, have it deploy from CI into an auto-scaling cluster of container hosts, configurable via environment variables. That was difficult, as the seniors all had reservations against frameworks, tests, and containers. So I went slowly, introducing stuff one by one, as it made sense:

* I started by rewriting core code as modules, in particular the database wrapper. They had cooked up an OOP abomination of mysqli-wrapper, instead of just moving to PDO. So I wrote a proper PDO wrapper that exposed a compatibility layer for the old method calls, and provided some cool „new“ stuff like prepared statements. Modules like this could be installed from a private composer registry, which helped justify the need for composer. * instead of going for Symfony, I created a very thin framework layer from a few Symfony components on top of Slim. This didn’t felt as „magic“ as the bigger options would have, and didn’t scare the devs away. * to build up trust, I added an nginx in front of the old and the new application which used version-controlled configuration to route only a few endpoints to the new app selectively. This went well. * now that we had proper entry points, we could introduce middleware, centralised and env-based config and more. In the old app, we reused code from the new one to access the configuration. Dirty, but it worked. More and more Code was moved over. * I started writing a few tests for core functionality, which gave confidence that all this was really working fine. I wasn’t really able to make the other devs enthusiastic about testing as I would have liked back then, though. * Testing showed the need for dependency injection, so I introduced PHP-DI, which brought the most elegant dependency injection mechanisms I know of. The senior devs actually surprised me here, as the accepted this without resistance and even appreciated the ability to inject instances into their code. * deployments would require uploading lots of files now, so I introduced BuddyCI, which is probably the most friendly CI server. It would simply copy everything from the repository to the servers, which was a large step forward considering the seniors suddenly couldn’t just upload fixes anymore. * with the deployments in place, I introduced development and production branches, and let the team discover the need for fix and feature branches by itself. * to avoid having to run both apps and nginx, I added container configuration and docker compose to spin up the stack with a single command. This convinced everyone. * from there on, I added production-ready containers and set up kubernetes on Google Cloud (this is something I wouldn’t do at most places, but it made sense at this particular org). We deployed copies of the app into the cluster, and set up a load balancer to gradually move requests over. * one by one, we migrated services to the cluster, until practically all workloads were running as containers. The images were built by the CI, which would also run tests if available, push the images, and initiate the rolling update. * at this point, things were very flexible, so I could add delicacies like dynamically deployed feature branches previews, runtime secrets, and more.

All in all, we went from 80+ bare-Metal servers (some of them not even used anymore) to a 12 node GKE cluster. Instead of manually updating individual files, we got CI deployments from production branches. Secrets in the code were gradually replaced with environment variables, which were moved from source-controlled .env files to cluster secrets. Devs got confidence in their code due to tests, feature branches and local execution. From a custom „framework“, we moved to commonly known idioms, paving the way for a migration to a full framework.

What I didn’t manage was introducing database migrations, disciplined testing, and real secret management.

I hope this helps you, if only to draw inspiration to get started _somewhere_. Best of luck!

Haha that's what a life of contract developers is! I seen all the same for so many times - minus the $20M of annual revenue.
> I have to find a strategy to fix this development team without managing them directly

Sorry what? What position are you in here? If you have no authority here then you are in a very precarious situation and you should figure that out first.

> this code generates more than 20 million dollars a year of revenue

> aggressive roadmap

> budget is really tight

Leave. If you care about the space, start a competitor.

$20m/year? With such a terrible codebase too, huh. Wow. Anyway what's the first letter of the place you work at?
As a prerequisite, get the code and database schema (+ scheduled updates to CSV's of number of rows/table, per-table storage size, per-database storage size, etc.) into source control ASAP. You can do this entirely on your own local machine day 1, automating rescanning and committing a diff at least daily. Also regularly commit all of the configuration files for the production environment (devops, installed packages/versions, bash histories, etc.) that you can get access to.

In parallel is a review of the disaster recovery plan... do a full test restore of code + data from scratch!

I would then encourage an evaluation to get the lay of the land. If my intuition is correct, there are high priority problems in production that no one is aware of, well beyond the tech debt.

Start by setting up centralized error logging as quickly as possible, from the simple 404/500 error and database timeout reporting (is there any low-hanging fruit here redirecting URLs or speeding up the DB [indexes]?) to more deeply entangled server-side error reporting... ELMAH was an eye-opener when first dropped into an existing cowboy-style ASP.NET app, I don't know if something similar exists for PHP for free but you could learn a ton just trialing a commercial APM solution (same for db optimization tools).

Then once the fires are identified and maybe even a few are out, analyze available metadata to determine the highest-traffic areas of the application. This combines client-side analytics, server-side logs, and database query profiling, and guides where issues should be fixed and tech debt should be paid down first. You can get down to "is this button clicked" if you need to, but "is this page/database table ever accessed" is helpful when getting started. (It's often nice to separate customers from employees here if you can, such as by IP if working from an office.)

Do you have the option of pursuing hardware upgrades to improve performance? (Is this on-prem?) You might want to dig into the details of the existing configuration, especially if the database hasn't been configured correctly. Which databases are on which drives/how are available iops allocated/can you upgrade RAM or SSDs? One big item here is if your are nearing any limits on disk space or iops that might mean downtime if not addressed quickly.

In the cloud you have opportunity to find resources that are not being used anymore and other ways to cut costs. Here again you can trial commercial solutions for quick wins.

Finally, implement some type of ongoing monitoring to catch anything that happens rarely but may be absolutely critical. This might be best done through an automated scan of logs for new URLs and database queries. After a year to 18 months, you should have a good picture of which portions are completely dead (and can be excised instead of fixed). You can start cutting things out much sooner than that, but don't be surprised if a show-stopping emergency comes up at the end of the fiscal year, etc.!

These are all easily justifiable actions to take as someone hired to get things headed in the right direction, and can earn the political capital necessary to begin pursuing all of the other recommendations in this thread for managing technical debt.

Edit: one mention in the thread of prioritizing restructuring the DB, sounds best but also tough.

“Before you heal someone, ask him if he's willing to give up the things that make him sick.” ― Hippocrates

You have to have a conversation with the people responsible for this shit, including (and specially) stakeholders, make them aware of the problem, and get them on board with respect to the possible solution. This step is essential before even bothering to do fucking anything.

Most importantly, make it clear that while you are there to help, this is their responsibility, and they have to become a part of the solution by making amends. If they're not willing to own their responsibility and collaborate, get the fuck out of that tech debt mill or it will ruin your life.

If you want to try to redo everything alone in silence, you will have to work infinitely hard, and in the end, three things can happen:

a) you fail, and then the organization gets rid of you. the most likely outcome.

b) you succeed, but now "you know too much", you have dirt on a lot of people that fucked up and become the Comrade Legasov from Chernobyl that becomes the target of important people from the Soviet communist party. They will get rid of you once the problem is gone because now you have no value to them.

c) in the best case scenario, you succeed, but noone will congratulate you because that means that a problem existed in the first place, and since noone is willing to assume any responsibility for their contributions to the problem, noone will say fucking anything. all your contributions will be for nothing. and if you insist that a problem existed, you'll go to outcome b). Otherwise, they will go back to their old ways and create the next fucking mess for you to solve.

Personally, I would get the fuck out. It is clear that nobody there was committed to do the right thing, starting from the hiring process. It is either highly unprepared people, extreme normalization of deviance, or some highly idiotic leadership obsessed with the short-term. Whatever it is, that team is rotten and needs an amputation. If I stayed, I would start by laying off the entire team and then rehiring everyone on a 3-month test period where they will have to completely change their attitude towards development.

Start with getting your source control and deployment in order. If you have to, lock down production so that the only way to deploy is via a checkin. Then fix the rest of the ops and get all the configs into source control, especially the NGInX config. Make sure memcache is set up for scaling later.

Then start in on the code. Start by writing some basic tests (you'll probably have to do this as a series of curl commands because it's unlikely the interfaces are clean enough to do it any other way). You'll need the tests to make sure everything else you do doesn't break major functionality.

Then do the easy stuff first. Fix the parts that curl itself and make it a real API call. Fix the dependency management. Compress the NGInX file by eliminating whatever rewrites you can by adding routing into the code. Test often, deploy often.

Enable tracing to figure out what code can be safely deleted. See if you can find old versions sitting around and do diffs.

Replace all the code that accesses the data store with a data access layer. Once you've done that, you can bring up a new data store with a proper schema. Make the data access layer write to the new data store and do queries by joining the old and new as necessary. If possible have the data access layer write any data it reads from the old data store into the new one after it serves the request, and read first from the new data store. Log how often you have to read from the old data store. In theory this will go down over time. Once there isn't a lot of reads from the old data store, write a program that runs in the background migrating the remaining data.

Most likely you can do all of that without anyone really noticing, other than teaching them a new way to write code by doing a checkin instead of in production. Also you'll have to teach them to use the data access layer instead of directly going to the data store.

After you've done all that, don't try and rewrite the code. Spin up a new service that does some small part of the code, and build it properly with frameworks and libraries and dependency management and whatever else makes sense. Change the main code to call your service, then delete the code in the main service and replace with a comment of where to find the new code. Maybe if no one else is working on that service they won't notice. Make sure new functionality goes in the new service with all the dependency management and such.

Keep doing that with small parts of the code by either adding into the new service or spinning up new micro services, whichever way you think is best. Ideally do this in the order of how often each function is called (you still have tracing on right?). Eventually most of the important stuff will be moved, and then you can decide if you want to bother moving the rest.

Hopefully by then you'll have a much better velocity on the most important stuff.

You're making 20M, you spend probably less than 500k between you and 3 juniors.

I've lead rewrites in worse circumstances (larger codebase split in 30 microservices, 15 people across 3 teams, making just 2M per year!) and I don't think you can do it with your current team. In the above examples we downsized the teams to 1 team with 4 people and then rewrote to 2 rightly sized services.

The new team was all new people (introduced gradually), while we shifted out the previous employees to other areas of the business.

The bottom line you have to use with management is you need a more senior team. Hiring seniors is pretty hard nowadays and it doesn't sound like you can offer much of an environment.

Get a good agency for 1M / year and let them work with your team to understand the ins and out and then replace them.

simonw's version control plan would be my step 1.

Step -2 is what you are doing now, OP, getting informed about the best way to go about this.

Step -1 is forming the battle plan of what you're going to change and in what order of importance.

Step 0 is communicating your plan to all stakeholders (owners, managers, devs, whoever) so they have an idea what is coming down the pipe. Here is where you assure them that you see this as a long process of continual improvement. Even though your end goal is to get to full VCS/CI/CD/DB Migrations/Monitoring, you're not trying to get there TODAY.

Step 1 is getting the codebase into a VCS. Get it in VCS with simonw's plan elsewhere in this thread. It doesn't have to be git if the team has another tool they want to put in place, but git is a decent default if you have no other preferences.

Step 2, for me, would be to make sure I had DB backups happening on a nightly basis. And, at least once, I'd want to verify that I could restore a nightly backup to a DB server somewhere (anywhere! Cloud/Laptop/On-prem)

Step 3, again, for me, would be to create an automatically-updated "dev" server. Basically create a complementary cronjob to simonw's auto-committer. This cronjob will simply clone the repo down to a brand new "dev" server. So changes will go: requirement -> developer's head -> production code change -> autocommit to github -> autoclone main branch to dev server.

Chances are nobody has any idea how to spin up the website on a new server. That's fine! Take this opportunity to document, in a `README.md` in your autocommitting codebase on the production server, the steps it takes to get the dev server running. Include as much detail as you can tolerate while still making progress. Don't worry about having a complete ansible playbook or anything. Just create a markdown list of steps you take as you take them. Things like `install PHP version X.Y via apt` or `modify DB firewall to allow dev server IP`.

Now you have 2 servers that are running identical code that can be modified independently of each other. Congratulations, you've reached dev-prod parity[1]!

Note that all of these changes can be done without impacting the production website or feature velocity or anyone's current workflow. This is the best way to introduce a team to the benefits of modern development practices. Don't foist your worldview upon them haphazardly. Start giving them capabilities they didn't have before, or taking away entire categories of problems they currently have, and let the desire build naturally.

There are a number of things you mentioned that I would recommend NOT changing, or at least, not until you're well down the road of having straightened this mess out. From your list:

> it runs on PHP The important part here is that it _runs_ on anything at all.

> it doesn't use any framework This can come much, much later, if it's ever really needed.

> no code has ever been deleted. As you make dev improvements, one day folks will wake up and realize that they're confident to delete code in ways they didn't used to be able to.

> no caching Cache as a solution of last-resort. If the current site is fast enough to do the job without caching, then don't worry about it.

[1]: https://12factor.net/dev-prod-parity

But, the code makes 20 million dollars a year. How bad can it be?
no code is ever deleted. things are just added.

Yeah, we hate that. On the one hand, it's impossible to build off a shaky foundation. On the other hand, software quality rarely correlates with revenue. That why we call it work?

s.gif
If this is an impression of the butt-hurt individual defending his work, then bravo because this is pretty funny.
I think you may need to use the probable security dumpster fire lurking in code as an impetus for change.

I find it a little shocking that 3 junior engineers can’t be convinced to learn/try something new that might look good on their resume or make their lives easier.

s.gif
Applications are open for YC Winter 2023
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK