Chances are you weren’t giving much thought to the global stability or security of the Internet on Friday, October 21, 2016. Even if you run an Internet-based business, it likely wasn’t a top concern. But it certainly was on that Friday! That was the day when, for a time, it seemed the Internet itself was at risk of catastrophic failure.
In case you’re reading this post several years after its publishing (which, thankfully, means the Internet is still functioning), here’s a quick refresher: October 21, 2016 was the date we experienced what cybersecurity experts at the time deemed one of the largest and most sophisticated hacks ever on the Internet itself. The DDoS attack (a “distributed denial-of-service”) was generated when thousands of Internet-connected devices (digital cameras, DVRs, etc.) were infected with a malware called Mirai — significantly slowing web traffic and disrupting or even shutting down service for many of the world’s most popular websites.
“Is your roadmap flexible enough to handle a disaster?”
Again, if you’re reading this several years from now, these sites include Hotmail, AltaVista, CompuServe and MySpace… just kidding. It actually affected sites like Amazon, Twitter, Spotify and PayPal.
Although all of the specific security measures and defense protocols taken to counteract this DDoS attack weren’t shared with the general public, it seems the companies responsible for the Internet’s backbone handled things as well as we could have hoped for.
Perhaps most relevant to our discussion here, the company responsible for the managed DNS infrastructure most affected by the attack — an organization called Dyn — seemed to have a disaster preparedness plan already in place for just such an event. Dyn’s plan included:
- Technical countermeasures to stop, slow, or at least mitigate the damage
- A real-time investigation to trace the attack back to its source
- Real-time learning to help the company immediately develop and deploy new measures to strengthen the Internet’s defenses and prevent similar future attacks
- An ongoing communications plan to keep the hack’s most affected businesses, law enforcement agencies, the media, and the public informed at all stages of the attack
What is Your Product Team’s Plan for a Disaster?
As damaging and concerning as that Internet DDoS attack was, it does present us with a silver lining. (See that? No cloud puns. You’re welcome.)
This hack serves as a great reminder of the need to review our own disaster prevention and disaster recovery plans — or, to create them for the first time.
So the question for you and your business is this: How would you handle a disaster that affected your product and your customers?
Do you have a plan in place that can be activated if your product experiences a failure or your company suffers a disaster such as flooding or a cyber hack? Are all of the necessary people in your organization prepared for such an event? Will they know how to respond?
“How would you handle a disaster that affected your product and your customers?”
And just as important, are you building preventative measures into your product itself? A strong disaster response plan will help minimize the harm done to your company and customers in the event of unexpected emergency.
With this massive attack still in the news, this might be a great time for you and your product team to review your disaster preparedness protocols, or develop new ones, and to earn the needed buy-in from your stakeholders to build both disaster prevention and disaster recovery planning into your product roadmap.
Here are some suggestions.
Disaster Prevention (or Mitigation) for Product Managers
As a product manager you are under relentless pressure to add game-changing features to your product. But you need to balance your roadmap’s focus between those headline-grabbing features your stakeholders and customers demand… and the much less exciting but equally important components that will give your product stability, safety, and protection against disruption or failure.
Some examples of these to feature on your roadmap include:
Compliance and certification
Bringing your product into compliance with data regulations like HIPAA, or obtaining quality-assurance or security certification from standards bodies such as ISO can be a good idea. These certifications can help establish your company’s credibility with prospective customers. But they are also important for a more practical reason.
Gaining certification from these organizations can serve as a valuable proxy for determining that, yes, your product can withstand many forms of disaster — and either maintain normal operations or recover in a timely manner.
The right data backup infrastructure
If you’re running a SaaS (Software-as-a-Service) application, part of your product development needs to focus on how your customer data will be backed up. Will you store it on a single server? Should you maintain multiple copies of each customer’s data? Will you keep these copies in separate geographical locations to ensure redundancy? Will you encrypt this data while it’s at rest under your care?
And if you sell a software product that your customers download and maintain on premise, what responsibilities, if any, will you maintain for backing up their data? Will you have a recommended backup architecture for them to maintain? Will it include mirroring their data to an offsite cloud backup service?
To reference our earlier suggestion about certification and compliance, it’s worth noting that federal data laws like HIPAA and GLBA demand regulated businesses to safeguard their customers’ data. Among other measures, deploying a secure offsite backup solution is just one of those requirements. This is to ensure that if the company’s headquarters are affected by a power outage or fire, their customers’ data can still be recovered and accessed.
An appropriate level of fault-tolerance
Another important but unsexy component of your product roadmap should be a focus on setting up the right hardware to support your expected levels of traffic.
You will need to understand the volumes of traffic your product is likely to receive —how many simultaneous visitors, what features they will be using and how much bandwidth that usage will require to maintain a stable and acceptable user experience.
The hardware architecture you’ll need to support 1,000 customers will of course be very different from the hardware you’ll need to handle 1,000,000 customers.
For this reason, you might also want to campaign for a hardware architecture that is designed for faster and easier scalability. That way, if your product experiences the coolest of all “disasters” — overwhelmed with greater-than-expected customer usage — you will be able to quickly ramp up capacity to keep your customer enjoying a high-quality user experience.
“Fault tolerance: not as sexy as a new feature, but still important for your roadmap.”
Of course, some of the suggestions we’re offering here might fall under the responsibility of your engineering team or some other department in your company — and not product management. That’s okay. The important thing is that you have these disaster prevention items on your mind and that they make it onto the product roadmap.
How about you? What disaster preparedness or recovery strategies has your company put into effect? Have they proven useful? Please share them here. These are the types of learnings we’d much rather gain by reading a blog post than from firsthand experience.