Engineering Contributions and their struggles
I really just wanted to use the ‘Help Wanted’ SpongeBob anchovies as my thumbnail image.
Talk about perfect timing, I’ve been waiting for this episode of Design System Office Hours by Davy and PJ and I feel so much better, sort of. I don’t actually feel better because there is no clear solution just like everything else (cue exhausted sigh) BUT at least I’m not alone AND we are one step closer to having something like a contribution model.
This is not a burn-book post, if you ever watch Mean Girls that reference would make sense, but more of how we are going through this spartan race to find a solution for our specific needs and how we are working through it.
Start 2 Finish, Finish 2 Start
Two methods on how we are looking at contribution models. ‘Start to Finish’ is straight forward. A squad takes on a design system task from the start all the way to the end. In our use cases, thats refactoring a component. ‘Finish to Start’ is the opposite. To ensure we are not blockers for other teams, we encourage them to build locally and then the design system team will migrate that code over to the main repository.
This post is to talk about the ‘Start to Finish’ and what we are finding.
My first options doc
Let me set the stage really fast. Design systems was not a new initiative at Housecall, but the level of emphasis on design systems, is. The product scaled extremely fast during 2020-2021 so the need to have a federated system was absolutely needed for all the reasons we know about design systems. The team was a bit of an experiment, trying to gather the right folks to (re)build the design system that was already being used (a post for another time) all while supporting a TON of legacy code.
As I’m helping define the roadmap, putting together dependency mapping, explaining what components we should work on first and why and how it daisy chains into other things, bla bla bla, I’m suddenly asked this:
“What would it take to condense the multiple years view into a single quarter. No idea is off limits. If you had 100 engineers at your disposal or if you had 24 designers at your disposal or if you had 100 engineers AND 24 designers at your disposal or SOMETHING different. What would it take to get the bulk of the DS into the software by the end of 3Q24.”
That’s a pretty broad question. Here is how I broke this down.
Defining the definition of done - DOD
Since the team was still kind of young, our DOD was a little bit of a moving goal post so thats where we started so that we could work backwards. Our DOD would help us estimate timelines, number of resources, who those resources are and all the testing/qa plans. In short, our DOD was defined as
“DOD = 90% of component has been staged, tested and in production”
A bit lofty, but admirable, plus we were going to get a billion resources 😅!
I worked on different options of best case scenarios to the bare minimum. Here are the proposed options broken down:
Solution 1 was to set our baseline, that is what we had working at the time, which was not much. Solution 5 was the biggest solution and pretty far fetched since there was no way we would have enough reviewers to comb through all the PR’s and a bunch of other logistics. Solutions 2-4 were realistically going to be the most viable. Each solution had a breakdown of Pro’s and Con’s, technical complexities, technical assumptions and any major dependencies based on what we thought to be true at the time.
Solution 2 was a bust. Didn’t have enough bodies and resources to dedicate a whole quarter to design system work. The cost alone to get that many additional people in PM, Design and Engineering was way too costly.
Solution 3 seemed possible. There were other squads that were eager to help contribute to the design system and showed interest after a brief engineering survey.
Solution 4 was just a Frankenstein of 2 and 3 with an additional web engineer. Hiring and onboarding in the proposed timeline was going to burn time and would work against us in the short term.
The first try
We decided to give Solution 3 a try. A squad would take a design system component that needed to be refactored and go through our process from start to finish. Here is the reality of what we found:
Timezones are a real thing. We have engineers all over the world and our Eng Manager lives in California. The time difference for architecture reviews and async was awful.
Domain engineers are NOT system engineers. We found out REALLY fast that domain engineers are experts in their domains and that is not the same as a system engineer. The sheer scope of a single component is spread across multiple domains that specific domain engineers just do not get exposure to often, if ever.
Standardization of coding. Different engineers code a little different from each other, but when you work for a system, coding semantics and structure have to be spot on - or else it breaks everything.
Async is hard. Engineers and designer were not saying the same things to each other which led to a ton of back and forth with global teams, which equates to days burned just in trying to figure out what we were saying to each other.
Following the process can be challenging for teams who don’t have any historical context. Since our team works with legacy code, we have check and balances in place to ensure that we are not overwriting something REALLY important and breaking hundreds of instances. We parse changes out behind feature flags for this reason. Because communication was broken down for all the reason above, these steps were getting missed. Yep, we broke staging a couple times.
Design reviews and demo’s were skipped. Engineers were not used to sharing PR deploy previews, that crucial step got missed, which led to me telling them to rebuild something.
IT WASN’T FASTER. Take everything I said above and put that into a timeline, it was not actually faster, it was much much slower.
Solution 3 was clearly not going to work and it was not achieving our end goal of getting the refactors done by end of Q3.
The second try
We learned a lot of what didn’t work but this actually was a really good thing. We exposed areas that we were lacking. Not only would this help other teams, but it would help ourselves.
We made templates. Every component needed a Spike document. Our Eng Manager and Senior React Eng worked together to built a Spike template that asked all the right questions we needed answers prior to an arch review.
We got tighter on our process. It became clear that every step and every milestone was crucial to a components success. Nothing gets skipped.
The importance of documentation. Not just for other teams, but for us as well. ‘How to’ documents like how to set up a feature flag, opening PRs, who needs to be reviewers etc.
The second refactor component went a little smoother, but we still battled against timezones and split responsibilities for our Eng Manager. Reviews were still being handled async, even with templates, milestones and Jira status’.
The Third Try
We took everything from the second attempt and changed architecture review times and prioritized pre-education. Instead of just throwing documents over to Engineers and PM’s we started having kick off meeting in person to talk about the goal, our process, whats steps are important and what meetings are mandatory. We created specific temporary Slack channels for drilled in conversations and progress sharing. We mapped out ‘how to test’ and report bugs and had processes on how to address these bugs.
There are probably a handful of ways that this could have been solved sooner, easier and faster. The point being is that every attempt that our team has conducted has been a learning opportunity and we are still learning and shaping the contribution models and if they’re even worth the work. Something that PJ and Davy bring up in their episode is choosing select elements that others can contribute to, like documentation on a specific pattern/component. Maybe thats the next iteration of contribution models.
Thoughts
Do these attempts get us closer to getting the refactors done by the end Q3? Yes. Is this a perfect solution? Absolutely not, it was and still can be a bit painful.
What’s important is that we are educating and bringing awareness to outside squads on why design systems is so important and why it matters for all of us to be working towards a common goal. We are incrementally building our own credibility and trust with other teams and with ourselves.
Are contributions good or bad?