Why Self-Service APIs can be your salvation

4 min readJul 20, 2020

There are a lot of reasons why platform teams should provide APIs for application teams (stream-aligned teams).

I would like to show you my experiences what happens, if there is a lack of those self-service APIs.

https://miro.com/app/board/o9J_kpdtPGs=/

increased wait-time

That is the most obvious one. If you need to call someone or create a ticket or JIRA task for another team you have to wait. The other person has other priorities and you can’t expect that the work will be done immediately. You won’t do that neither.
Evan Bottcher calls this “backlog-coupling” in his excellent platform article.

higher parallelism

This will happen on both sides, the one who sends the proposal and the assignee who needs to work on your ticket.

You, the creator of the ticket, have to fill the wait-time until your colleague finished your ticket. Or at least you think you have to. You need “to do something”, isn’t it? ;)
Your co-worker receives more small orders and maybe needs to stop his/her current work for you.

overload

There is a good reason why an important kanban principle says to set a WIP (work in progress) limit. Too much parallel work, too many context switches decreases your output and concentration.

And I don’t know anyone who loves doing countless things but cannot remember any details at the end of the day.

That are good reasons why platforms should provide self-services to developers, as recently learned from TeamTopologies training

high lead-time

Backlog-coupling due to depended teams (backlog-coupling) and overloaded teams result in higher lead-time.

unclear requirements

Manual hand-offs not just have time-based disadvantages but also result in misunderstandings and unclear requirements. Natural languages are not designed to be formal, exact and clearly understandable by everyone. There is a reason why scripts are far better than textual documentation for a given task.
So it is with written work orders. Sometimes the receiver is honest enough to ask you again if something is unclear — that again increases wait-time, parallelism and lead-time, but worse the receiver just does something and you do not know what was really happening. That leads me to …

misconfigured systems

With unclear requirements the chance you don’t get what you want (but maybe get what you asked for, or at least what the receiver thought you asked for) is very high. If you recognize this misconfiguration days or weeks after and don’t see the connection to your last work order, you will spend much time and effort with this failure. Which — surprise surprise — increased overload and lead-time.

production outages

Those misconfigured systems often end in production outages. As already mentioned: The best scenario is, when the problems occure immediately. In that case the chance that you can correlate between the outage and your change request is high. The worst scenario would be, if you recognized a misbehaviour in your system days or weeks after.

No insights what was really done

Systems without self-services often also don’t show you how the system was configured. It is kind of funny that it’s very common that teams/systems which do not offer self-services don’t publish their configuration either. However, at the time you create a change request, they often argue that “you as a work order submitter need to know exactly what is the current configuration and what you want to change”. Rarely a work order starts with “please show me the configuration of component xy because I think we need some changes”.
Be aware of change requests which lead to other (hidden) change requests:

No clean rollback possible

E.g. you request a new database. And the DBA is a good friend of yours and also requests the firewall rule from your application to the database. However, after some time you don’t need the database anymore and create a work order to delete the database. Does the firewall rule also get deleted? Maybe, maybe not.
And if not, it is …

Hard to do experiments

The reason for experiments is to try if things work or not. So it is quite normal that some experiments fail. And if they fail, you want to remove all the mess you created for your experiment. If you tried a complex caching-system to improve the performance, but this kind of caching-system doesn’t work for you as you liked, removing this system should also remove all indirectly created systems (e.g. dedicated caching images, filesystems, again firewall-rules or whatever). If that cleanup doesn’t work, you will add a mess to the system with every experiment, which leads to system outages again.