The Opportunity
For a service provider with workloads developed over years and spread across various types of resources, it’s usually a “if-it’s-not-broken, don’t-fix-it” mindset. However, when a current ecosystem fails to meet the requirements of business growth or changes in subject domain, every legacy enterprise can take advantage of leveraging new technologies in a bid to improve service delivery.
With the current cloud hangover surrounding the world, most IT houses feel that just moving to cloud will solve their problems as it is. While it may be true for simple workloads hosted on-premise, for complex workloads it is a combination of 6R’s (Re-Host, Re-Platform, Retire, Re-Architect, Re-Purchase and Retain). Our client at the heart of this project, one of the largest and most reputed Insurance Regulatory and Development Authority of India (IRDAI) licensed third-party administrators, partnered with SourceFuse to explore the potential opportunities for workloads cloud migration and modernization.
Key Challenges
At SourceFuse, the starting point of any project is a deep discovery process of the entire ecosystem. It assessed physical deployments, networking, third-party interactions, compliance mandates, apart from a high level study of the application code base, to identify strategies for each aspect along the 6R’s as per business need. While every aspect can be re-architected, it is the major constraint of time and money which dictates the most suited solution.
Based on a comprehensive and detailed assessment, undertaken via collaborative discussions, document evaluations and data gathering tools, SourceFuse identified the following main concerns for its customer:
- Windows license for the virtual machines was a major cost, and not contributing towards any business specific needs.
- Infrastructure was fixed size and running 24×7. While this was losing money in none or minimum load, at peak loads the system suffered from degraded performance due to resource crunch.
- Application deployments were completely manual. This led to lots of time and effort being spent to release a feature, thus delaying business improvements and end-user satisfaction.
- All the resources were set up and configured manually. Lack of infrastructure scripts made restoration impossible in the event of any corruption. Any changes were validated separately and required a manual validation for correctness.
- Resource provisioning was mixed up with a lot less separation per application or environment.
- Security rules were tightly coupled in their on-premise firewall and not extensible to growing scenarios presented by the ever increasing data breaches and hacking attacks. These rules were directly applied to the firewall and not documented extensively, leading to missed information in case of resource changes or firewall crashes.
- Application was completely monolithic and changes to any aspect needed a complete deployment and testing cycle of the entire application.
- Legacy nature of the application had led to difficulty in maintaining it without any adherence to development standards.
- Application was storing data files on attached disk storage which, due to its increasing size, was difficult to manage.
- Test cases were quite outdated and lacked automation leading to lengthy testing cycles. This also led to delays in feature release since each instance required a full regression performed manually.
Based on the effort required for these and the client’s priorities, SourceFuse setup a statement of work to address the above challenges while reducing overall TCO of the ecosystem.
The Solution
SourceFuse setup two independent tracks for the solution, one for the infrastructure and another for the application modernization aspects.
Infrastructure Modernization
The main application was moved from .NET Framework to .NET Core which enabled its hosting to be dockerized and moved to AWS ECS platform. This enabled freedom from Windows licensing by using a Linux based EC2 machine backed by Graviton as the foundational infrastructure.
ECS was also enabled along with introducing a load dependent scalable solution. This ensured that the workload had negligible costs for low volume traffic and can handle any amount increase by automatically increasing the count of the instances.
To reduce overall costs, we considered several options like processor families, instance sizes, storage volumes and usage patterns, as well as the reserved instances. While the non-modernized workloads were left intact for the time being, the ECS backbone was set to Graviton. This offered a lower cost while still being compliant with the needs of a modernized application. Along with this, the non-production workloads were set to turn down on weekends and non-working hours to save additional costs. As these workloads were expected to be evolving continuously in the modernization journey, we did not consider the reserved instances until we reached a stable production usage for a few months of validation.
The modernized application was continuously integrated and deployed with AWS CodePipeline with a seamless connection to the source control and AWS ECS. Along with standard compilation checks, we also checked for application coding standards using Sonar as well as vulnerability checks using Snyk to ensure that only well written code ends up being processed further in the workflow. Environment specific controls ensured that while development builds happen continuously on every merge to develop branch, QA and Stage builds were manually triggered to ensure controlled release.
To enable the quick recovery and validation of the infrastructure, the complete setup and configuration was done using infrastructure as a code (IaC) script. In order to accelerate this, we utilized our home-grown Application Reusable Components (ARC) components, leading to faster delivery while also conforming to the design standards expected by AWS’s WAF as well as general industry practices. Changes to this can be tied to specific requirements and help establish a traceability matrix, helping the compliance team to validate and approve.
We set up a landing zone, organizing the entire infrastructure in a set of organizational units each representing a specific environment, along with separate units for shared services and security management. This provided a much needed clarity on resource placement along with separate billing to enable dedicated cost centers for better financial management.
The client was using a physical firewall to set up their security rules. While moving their complete organization to AWS, they wanted the new environment to be compliant with the existing rules and also add additional security to cater to new threats. SourceFuse set up a dedicated organizational unit for security to manage this crucial requirement at the central level. With an instance of Security Hub, all the existing conditionals were migrated. Additionally we set up new rules to comply with insurance specific requirements as dedicated GuardRails.
Application Modernization
With a legacy application, the codebase had inherited several issues over time. This was a monolithic application in MVC with NET Framework 3.5. While the controller layer was quite thin, most of the logic was contained in SQL Server stored procedures with client side handling via jQuery based JavaScript, often calling server methods via Ajax.
Based on this and overall application sizing with complexity, we had two options for modernizing:
- Rewrite entire application as a combination of client side Angular or React and server side as .NET Core WebAPIs.
- Organize the codebase of the application and move to .NET Core Web Application. Specific optimizations can be taken up on a case to case basis.
As the main goal was to get rid of the licensing aspect of Windows-based hosting, and complexity in extracting business logic from plain JavaScript to Angular, the second option was the best choice. It was more important to deliver functionality in a step-by-step way in order to contain the constraints of time and money, thus ensuring alignment to customer needs rather than only being technically perfect.
While most codebase is directly compatible between .NET Framework and .NET Core, only requiring change in underlying libraries; all applications have various third party libraries which change between such levels of integration. In many cases, there are tight dependencies on the Windows GDI component for image manipulation while the other libraries are almost dead and the same codebase now requires a rewrite and repurchase.
The major problem was with an Excel library heavily used to parse incoming files from integration partners. The component used didn’t have an .NET Core version so we needed to search for an equivalent one which could be integrated with existing code seamlessly. This resulted in several spikes while we checked several paid and open-source toolkits, going over their integrations and matching them with existing logics to ensure the level of code change to be minimal.
The application was storing their input files entirely on attached volumes, thus increasing the storage size heavily over the years (insurance needs seven years worth of data). We updated the code to use AWS S3 instead, ensuring safe and secured storage. This also enabled future movement to cold storage options like AWS Glacier when these wouldn’t be required as active files.
While we were converting modules, a major challenge that arose was that the client’s team was working on the older version for feature enhancements and bug fixes. We needed to continuously synchronize with their codebase and incorporate the delta changes almost every month. While this required repetitive changes to the same module, releasing individual modules was not possible, given the interdependence of modules till the entire system is released.
As the database access was from multiple places, we needed to extract the stored procedures and inline queries in one place. This was initially attempted manually as and when any specific API was getting migrated. With this, we ran a text search utility to go over the entire application and generate a text file with constants created. Next, we applied a similar strategy for the session and viewstate constants, also replacing them inline wherever they pre-existed. Not only did this save us time in manually creating a new constant and replacing its usage each time, it also led to reduction in code review effort and consistency.
The SourceFuse Approach
Anything can be only consumed when it is tested for both functional and non-functional requirements. With our modernization-led migration approach, we not only need to test the application but the infrastructure aspects as well.
For the virtual machines, post migration to AWS EC2, the client’s IT team ensured that the configuration was intact and could be synchronized with the policies defined in the domain controller. The application team validated the application settings by running a few sample workflows to ensure database and third-party connectivity were working properly. The security settings were validated by running crash tests via incorrect requests to check the firewall blocked them while allowing legitimate traffic through, without impacting on the overall performance.
At the application front, being a legacy application most of the documented test cases were quite old and edge cases were not applicable. We discussed each test case explicitly with the client and designed the appropriate test data to match the various test scenarios. As this was quite time consuming, we decided to set up automated testing for the underlying controller methods. With this established, we could validate any regression changes very quickly and only needed to focus on the frontend dependent workflows.
Disaster Recovery
Every business model has to be resilient against natural disasters and hence should have hosting across geographical regions. Due to government’s compliance requirements, insurance data originating in India cannot go outside the country’s boundaries. As Mumbai was originally the only available region in India, all the resources were set up there only. While the original plan was to have multiple availability zones within the Mumbai region, our client required physical geographical separation as well to ensure governance board approval. Fortunately, as the project started, AWS started another region in the Hyderabad region so we continuously tracked the availability of the services in the secondary region.
Being a new region, while the services remain scant in actual as well as partial feature availability, a decision was made to have resource-based disaster recovery (DR) so all services may not be dependent on the DR region. They also wanted only a few resources to be running live in the DR region, like the database.
We set up AWS Disaster Recovery Service (DRS) to synchronize the EC2 with the attached volumes to the DR region in Hyderabad. IaC scripts will be used for recreation of all resources, so fallback will not be an issue. For the primary application on ECS, this was setup with zero instances in the Hyderabad region. Whenever the DR is to be triggered, we can update the DNS records to point to the new region and have the instances increased. In case of fallback, the CI / CD configuration will redeploy it to the Mumbai region.
The Results
No matter how experienced one may be, every project provides the opportunity for fresh ideas and teaches new lessons. This one was a great learning experience for the whole team in how to decouple delivery from infrastructure and application improvements, while also ensuring the customer gets the best possible within the budget of time and money.
- By setting the application framework in the first sprint and validating it against the first module, we got the confidence that we are on the right track.
- Automating as much as possible is time consuming initially, but pays off big returns in future. We not only automated the deployment of code via CI/CD pipelines, we also ensured that infrastructure code follows the same traceability and evolves as per the application’s need and is deployed along with the application itself.
- Many times we implemented features but then rejected, due to failure to meet requirements or simply not working completely, but we ensured we documented the steps and the reasons for rejection. This led to building up a good knowledge base for future initiatives.
- Modernization is not a one step process, it’s a journey. We organized application artifacts by module, such that it becomes a candidate for future improvements, targeted on need basis.
- Establish a technical debt register to support future enhancements. As no implementation can be perfect and suited to each and every need, there will be some debt based on the technology choices. This can be handled in future when considering further improvement in .NET or when considering another AWS enhancement.
About The Customer
Founded in 1995 and headquartered in Hyderabad, the customer at the heart of this case study was licensed by IRDAI in 2002 and is currently one of the largest and highly reputed IRDAI Licensed Third Party Administrators (TPA) in India. It has presence in over 55 locations and 25 states Pan-India, catering to individual customers, corporate clientele, and state / central government sponsored health schemes, offering a range of allied healthcare and wellness services to its members. The constantly increasing number of clients is testament to its consistency and endeavors to render services par excellence.