Skip to main navigation menu Skip to main content Skip to site footer

Articles

Vol. 2 No. 1 (2023): April Edition 2023

Focus on resilience engineering in cloud services

Submitted
October 17, 2024
Published
2023-01-18

Abstract

Resilience engineering in cloud services is becoming increasingly vital as organizations rely on the cloud to support critical operations. This approach emphasizes designing systems that can withstand and quickly recover from failures, disruptions, or unexpected challenges, ensuring continuous availability and performance. In today's dynamic and often unpredictable digital landscape, resilience engineering goes beyond traditional disaster recovery by proactively identifying vulnerabilities and enhancing system robustness. By focusing on building resilience into cloud services, businesses can mitigate risks associated with hardware failures, cyberattacks, and software bugs, all while maintaining a seamless user experience. The cloud's inherent complexity, with its distributed architecture and multi-layered infrastructure, presents unique challenges that require innovative strategies for fault tolerance, redundancy, and real-time monitoring. Resilience engineering fosters a mindset that embraces failure as a learning opportunity, driving continuous improvement and adaptation. It involves collaboration across development, operations, and security teams to create a culture where resilience is ingrained in every aspect of the cloud service lifecycle. As cloud adoption continues to grow, resilience engineering is becoming a cornerstone of cloud strategy, enabling organizations to achieve long-term success by maintaining trust, reliability, and operational excellence. This focus on resilience ensures that cloud services are not only robust and reliable but also adaptable to future challenges, making them a key enabler of digital transformation.

References

  1. Zohuri, B., & Moghaddam, M. (2017). Business Resilience System (BRS): Driven through Boolean, fuzzy logics and cloud computation (Vol. 11). Springer International Publishing AG.
  2. Penaloza, G. A., Saurin, T. A., Formoso, C. T., & Herrera, I. A. (2020). A resilience engineering perspective of safety performance measurement systems: A systematic literature review. Safety Science, 130, 104864.
  3. Hollnagel, E., Nemeth, C. P., & Dekker, S. (Eds.). (2009). Resilience engineering perspectives: Preparation and restoration (Vol. 2). Ashgate Publishing, Ltd..
  4. Osanaiye, O., Choo, K. K. R., & Dlodlo, M. (2016). Distributed denial of service (DDoS) resilience in cloud: Review and conceptual cloud DDoS mitigation framework. Journal of Network and Computer Applications, 67, 147-165.
  5. Park, J., Seager, T. P., Rao, P. S. C., Convertino, M., & Linkov, I. (2013). Integrating risk and resilience approaches to catastrophe management in engineering systems. Risk analysis, 33(3), 356-367.
  6. Buyya, R., Garg, S. K., & Calheiros, R. N. (2011, December). SLA-oriented resource provisioning for cloud computing: Challenges, architecture, and solutions. In 2011 international conference on cloud and service computing (pp. 1-10). IEEE.
  7. Heorhiadi, V., Rajagopalan, S., Jamjoom, H., Reiter, M. K., & Sekar, V. (2016, June). Gremlin: Systematic resilience testing of microservices. In 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS) (pp. 57-66). IEEE.
  8. Ranasinghe, U., Jefferies, M., Davis, P., & Pillay, M. (2020). Resilience engineering indicators and safety management: A systematic review. Safety and Health at Work, 11(2), 127-135.
  9. Hosseini, S., Barker, K., & Ramirez-Marquez, J. E. (2016). A review of definitions and measures of system resilience. Reliability Engineering & System Safety, 145, 47-61.
  10. Kumar, N., Poonia, V., Gupta, B. B., & Goyal, M. K. (2021). A novel framework for risk assessment and resilience of critical infrastructure towards climate change. Technological Forecasting and Social Change, 165, 120532.
  11. Rimal, B. P., Jukan, A., Katsaros, D., & Goeleven, Y. (2011). Architectural requirements for cloud computing systems: an enterprise cloud approach. Journal of Grid Computing, 9, 3-26.
  12. Mukherjee, B., Habib, M. F., & Dikbiyik, F. (2014). Network adaptability from disaster disruptions and cascading failures. IEEE Communications Magazine, 52(5), 230-238.
  13. Gheorghe, A. V., Vamanu, D. V., Katina, P. F., & Pulfer, R. (2017). Critical Infrastructures, Key Resources, Key Assets: Risk, Vulnerability, Resilience, Fragility, and Perception Governance (Vol. 34). Springer.
  14. Garg, S. K., Versteeg, S., & Buyya, R. (2013). A framework for ranking of cloud computing services. Future Generation Computer Systems, 29(4), 1012-1023.
  15. Nan, C., & Sansavini, G. (2017). A quantitative method for assessing resilience of interdependent infrastructures. Reliability Engineering & System Safety, 157, 35-53.