Anonymous
Well mistakes are inevitable but i believe in following processes which can reduce mistakes and also we can foresee and plan to avoid them.
Engineering team should follow below processes:
1. Thorough Requirement Analysis
2. Design and code review
3. Incremental development and testing
4. Automated testing
5. Monitoring and alerting system on production
6. Post deployment support with retrospective and RCA
I was handling one project where we were integrating a 3rd party api for payment processing and we misunderstood a crucial part of their rate limiting policy of api and didn't implement exponential back off correctly.
During high traffic event, out application hit the rate limit and instead of retrying with exponential backoff we keep retrying immediately which caused a spike in failed transaction and customer complained.
We immediately halt affected service and rollback previous stable version. We had analysed logs and confirmed that issue and changed logic and implemented exponential backoff retrying and simulated testing with high traffic and documented this as lesson learned .
We learned a lesson that rate limiting is resilience not performance and also not to assume api behaviour based on doc. try and check edge cases to avoid such issues. we have added this learning in onboarding manual for new developers.