@BETALAUNCH.HQ
Chapter 7: Measuring the Progress of a Startup
Startups need to take their work extremely seriously. The financials set down in the business plan include projections of how many customers the startup expects to attract, how much it’ll spend and how much revenue and profit are expected.
A startup must correctly measure where it stands at the moment by exposing what the assessment reveals, and it then needs to conduct experiments to learn how to shift the real numbers closer to the expected numbers set down in the business plan. Most products, including the products that fail, have 0 tractions. Most products have customers, some growth and some good results. Entrepreneurs and employees tend to be naturally optimistic. They want to keep believing in their ideas when it’s ominous. That’s why perseverance is extremely dangerous. We had heard stories of entrepreneurs who managed to become successful when things appeared wrong. Unfortunately, we don’t usually listen to stories of entrepreneurs who persevered for so long that led to their failures.
7.0 Innovation Accounting
Innovation accounting helps startups to objectively prove that they are learning how to develop a sustainable business. It begins by converting the leap of faith assumptions into a quantitative financial model. Every business plan at a startup has some type of model connected to it. That model offers assumptions on what the business will appear to be at a successful point in the future.
That being said, innovation accounting has 3 steps:
1. Use an MVP to establish actual data on where the company currently stands.
2. Startups should tune the engine from the baseline toward the ideal. This can take many tries. Following the small changes and product optimization made by the startup to move its baseline towards the ideal, the company reaches a point of decision.
3. Pivot or persevere.
If the company is progressing towards the ideal, it’s learning properly and using that learning productively, in which case the company should persevere. If not, the management should accept that its current product strategy is wrong and needs a good change. When a company pivots, it starts the process again by re-establishing a new baseline and then fine-tuning the engine. A successful pivot indicates that these engine tuning activities are more productive after the pivot than seen previously.
7.1 Establishing the baseline
A startup might establish a complete prototype of its product and offer to sell it to customers through its main marketing channel. This MVP alone would test most of the startup’s assumptions and establish baseline metrics for each metric simultaneously. On the other hand, a startup might want to build a separate MVP aimed at getting feedback on assumptions separately. Before building the prototype, the company might conduct a smoke test with its marketing materials. This is a traditional marketing technique where customers are given a chance to preorder a product that hasn’t been built yet. A smoke test only measures if customers are interested in using a product. Although this isn’t enough to validate an entire growth model, it can be helpful to gain feedback on this assumption before investing more money and other resources in the product.
These MVPs are the first example of a learning milestone. An MVP allows a startup to fill actual baseline data in its growth model, such as conversion rates, sign-up, trial rates, customer lifetime value etc., which are valuable as the foundation for learning about customers and their reactions towards a product even if that foundation starts with bad news. If one startup selects among the many assumptions in a business plan, it’s practical to test the riskiest assumptions initially. If you can’t find a solution to reduce these risks to the ideal needed for a sustainable business, it’s impractical to test the other assumptions.
7.2 Tuning the engine
After the baseline has been established, the startup can start working towards its second learning milestone, which is, tuning the engine. Each product development, marketing or other initiative a startup accepts should be aimed at improving one of the drivers of its growth model. For example, a company might spend time improving its product design to make it easier for new customers to use. This assumes that the activation rate of new customers is a driver of growth and that its baseline is lower than the company prefers. To show validated learning, the design changes should improve the activation rate of new customers. If not, the design changes will be a failure. This is an important principle – a good design should change customer behaviour for the better.
7.3 Pivot or persevere
Eventually, a team learning its way towards a sustainable business will see the numbers in its model increasing from the terrible baseline created by the MVP and converge to something like the ideal one set down in the business plan. If a startup fails to do this, it will see that ideal go further away. If you’re not moving the drivers of your business model, you’re not making any progress. Thus, it’s time to pivot.
7.4 Innovation accounting at IMVU
IMVU’s MVP had many issues, and when they first released it, they had low sales. They assumed that fewer sales were due to the low quality of the product, so for weeks, they worked on improving the product’s quality. At the end of each month, they had a board meeting at which they would present the results. Before the board meeting, they would run their standard analytics, measure conversion rates, customer counts and revenue to demonstrate what a good job they have done. However, these improvements in product quality weren’t changing any customer behaviour. As a result, they started tracking their metrics more often and tightened the feedback loop with product development. Sadly, their product changes still made no changes in customer behaviour.
They thus tracked the ‘funnel metrics’ behaviours that were important to their engine of growth, such as customer registration, application download, trial, repeat usage and purchase. They needed enough customers to use their product to get accurate numbers for each behaviour to acquire enough data to learn. They allocated a 5 dollars budget each day which was enough to buy clicks on the then-current Google AdWords System. Those days, the minimum you could bid for a click was 5 cents, but there was no minimum amount to your spending. Therefore, they could afford to open an account.
That 5 dollars bought them 100 clicks a day. In a marketing view, this wasn’t considerable, but for learning, it was valuable. Each day they were able to measure their product’s performance with a new set of customers. Each time they improved their product, they got a new report card on how they were doing the next day. Every day they were conducting random trials. Each day’s customers were fresh from the previous day’s ones. Most importantly, although their gross numbers were increasing, it was clear that their funnel metrics were not changing.
One of IMVU’s graphs of 7 months of work demonstrated that they were making frequent changes to the product, releasing new features daily. In addition, they were conducting many physical customer interviews, and their product development team was working extremely hard.
7.5 Cohort analysis
To understand their graph, they needed to understand the concept of cohort analysis, one of the most important tools of startup analytics. Although it may sound complicated, it’s based on a simple hypothesis. Rather than looking at cumulative tools or gross numbers like total revenue and the total number of customers, one looks at the performance of each customer group that comes into contact with the product separately. Each customer group is called a cohort.
Managers with a sales background will understand this funnel analysis as the traditional sales funnel used to manage prospective customers on their way to becoming actual customers. Lean Startups use the funnel analysis for product development as well. This technique is valuable for many business types as every company’s survival is based on the sequences of customer behaviour called ‘flows’. Customer flows control the customer interactions with a company’s products. They allow you to quantitatively understand a business and have more predictive power than traditional gross metrics. Coming back to IMVU’s graph, no matter how many improvements, focus groups, design sessions and usability tests were conducted, the percentage of customers who paid money for their product was still the same as it was from the beginning. The cohort analysis thus confirmed that IMVU’s product had a problem. Once they knew what the problem was, they quickly made a vital pivot – go away from an IM add-on used with existing friends and towards a stand-alone network that one can use to make new friends. Once their changes were consistent with what customers wanted, their experiments were more likely to change customer behaviour for the better.
They started having good improvements and eventually made millions in a month. This is the sign of a successful pivot that the new experiments are more productive than the experiments that were run previously. Therefore, poor quantitative results compel you to accept failure and create the motivation, context and space for more qualitative research. These experiments produce new results and hypothesis to be tested, leading to a potential pivot.
7.6 Optimization vs learning
Engineers, designers and marketers are skilled in optimization. However, this isn’t useful for startups. If you’re building the wrong product, optimizing the product or its marketing won’t provide important results. A startup has to measure its progress against a high bar which is evidence that a sustainable business can be built around a startup’s product or services. This can only be assessed if a startup has made precise and tangible predictions ahead of time. Product and strategy decisions will be difficult and time-consuming if these predictions are absent.
Regardless of size, companies with a working engine of growth can depend on the wrong metrics to guide their actions. Unfortunately, this compels managers to resort to the usual last-minute ad buys, channel stuffing etc., in a desperate attempt to make the gross numbers look good. Eric Ries calls the traditional numbers used to judge startups ‘vanity metrics’, and innovation accounting necessitates you to avoid the temptation to use such vanity metrics.
7.7 Vanity metrics
We will consider IMVU as an example to understand vanity metrics. One of their graphs demonstrated the total registered users and the total paying customers who appeared to look good. But when considering the same graph in a cohort style, IMVU was adding new customers but wasn’t improving the yield on each customer group. Although the engine was tuned, the efforts to tune the engine weren’t successful. Innovation accounting doesn’t work if a startup is misled by vanity metrics like gross numbers of customers and so on. An alternative metric to judge your business and learning milestones is called the ‘actionable metrics’.
7.8 Actionable metrics vs vanity metrics
To get a comprehensive understanding of vanity metrics, we will consider the example of a company named Grockit, whose founder is Farbood Nivi, who was a teacher at Princeton Review and Kaplan and helped students prepare for GMAT, LSAT and SAT. Although he was awarded for his incredible teaching, he was unhappy with the traditional teaching methods. Eventually, he developed a better approach using a mixture of teacher-led lectures, homework and group study. He was especially inspired by the effectiveness of the student-to-student peer-driven learning method for his students as it benefitted his students in two ways:
1. Students could gain customized help from a much less scary peer than a lecturer.
2. Students could reinforce their learning by teaching their peers.
Eventually, Farbood’s classes became more productive and social, and he felt that his presence in the classroom was less critical. His idea was to bring peer-to-peer social learning to those who couldn’t afford expensive tuition from Kaplan or Princeton Review or a costly tutor. This is how Grockit was born. Currently, Grockit provides many educational products, but initially, it followed a lean approach. Grockit built an MVP which was Farbood teaching test prep through WebEx. He didn’t build custom software or no new technology. He merely wanted to bring his teaching approach through the internet. The news of a new kind of private tutoring online spread faster, and within a few months, Farbood was making a living. However, he didn’t build his MVP to make a living simply but did it as he had the vision to have a more collaborative and productive teaching approach to students everywhere. With his initial traction, he raised money from prestigious investors.
Due to their partnership with Pivotal Labs, Grockit’s product development team followed a strict version of the agile development methodology called extreme programming. As a result, the press praised their early product as a breakthrough product. However, they were not seeing enough growth in the product’s use by the customers. Following the agile practice, Grockit’s work continued in a series of sprints or 1-month iteration cycles. For each sprint, Farbood would prioritize the work to be completed that month by writing a series of user stories which described the feature from the customer’s view instead of writing a specification for a new feature in technical terms. These stories helped the engineers focus on the customer’s view throughout the whole development process. He was able to reprioritize the stories any time he liked. As he learned more about what customers wanted, he could move things in the product backlog, the queue of the stories that were yet to be built. The only issue was that he couldn’t interrupt any tasks that were in progress. Thankfully the stories were written in a non-technical and understandable manner, so the batch size of work was only a day or a maximum of 2 days.
This system is called agile development as teams that employ it can change the direction immediately, be flexible and be highly responsive to changes in the business requirements of the product owner. The team continuously delivered new product features; they collected customer feedback through anecdotes and interviews, which showed that at least some customers liked the new features. Nevertheless, Eric Ries felt that Farbood and his team had doubts about the company’s overall progress. Although the product was improving daily, he wanted to ensure that those improvements were important to the customers. Unlike most entrepreneurs who cling to their original vision no matter what happens, Farbood was willing to test his vision. However, he wasn’t sure if his team was learning anything as engineers agree to the business’s continuous changing requirements but are not responsible for the quality of the business decisions made by the main man making those decisions, Farbood. He and his team didn’t realize that vanity metrics measured Grockit’s progress, that is, the total number of customers and the total number of questions answered. This was what was making the team unproductive, and it wasn’t actually improving. He was unhappy with his efforts to learn from customer feedback. In each cycle, the type of metrics that his team focused on changed, such as 1 month they will view gross usage numbers, another month they will view registration numbers etc. Those metrics would fluctuate, and this was challenging to prioritize work therefore correctly. He could have asked his Data Analyst to investigate specific questions like when we shipped feature Y, did it impact customer behaviour? When did feature Y ship? Which customers were exposed to the feature? Was anything else launched at the same time? Finding answers to these questions would have required big data. Meanwhile, the team would have shifted to new priorities and new questions that needed urgent attention.
7.9 Cohorts and split tests
Grockit changed the metrics they were using to evaluate success in two methods. First, rather than looking at the gross metrics, they switched to cohort-based metrics and rather than looking for cause and effect relationships after the fact, they launched each new feature as a genuine split test experiment.
A split test is where different product versions are provided to customers simultaneously. By observing the behaviour between the 2 groups, one can conclude about the impact of the different variations. Split testing always uncovers surprising facts. For example, most features that make the product look better to the engineers and designers do not influence customer behaviour. This was the situation at Grockit, like in every startup that Eric Ries has seen. Even though split tests might seem more complicated as it requires more accounting and metrics to keep track of each variation, it almost frequently saves a lot of time in the long run by removing work that isn’t important to customers. Split tests also help teams refine their understanding of what customers require and don’t require. For example, Grockit’s team continuously added new methods for their customers to communicate with each other, hoping that those social communication tools would increase the product’s value.
Built-in those efforts were the assumption that customers wanted more interaction while studying. When the split test revealed that the extra features didn’t impact customer behaviour, it doubted that assumption. The doubts made the team understand deeply what customers actually wanted. They studied new ideas for product experiments that might have had more influence on customer behaviour. Most of the ideas were old but were simply avoided as the company was more focused on building social tools.
Therefore, Grockit had an idea of testing solo studying mode so that students would have the choice of studying by themselves or together. This was highly effective, and without split testing, the company wouldn’t have realized this idea. Eventually, through many tests, it was clear student engagement increased when they were offered a mixture of social and solo features on their preferred mode of study.
7.10 Kanban principle
The Kanban principle is an agile project management tool designed to help visualize work, limit the work in progress and increase productivity. By following this, Grockit changed its product prioritization process. Under the new system, user stories were not considered complete until they led to validated learning. Therefore, the stories could be catalogued as being in one of the 4 states of development in the kanban board, such as product backlog, actively being built, in progress, built and validated. Validated here was defined as knowing if the story was a good idea to have been made in the first place. This validation only came in the form of a split test demonstrating a change in the customer behaviour but might also include customer interviews or surveys.
The kanban principle only allowed so many stories in each of the 4 states. So the buckets filled up as stories came from one state to the other. Once a bucket became full, it couldn’t accept more stories. Only when the story was validated could it be removed from the kanban board. If the validation failed and the story turned out to be a bad idea, the relevant feature was removed from the product.
Eric Ries has implemented the Kanban principle with many teams, and the first result is always bad as each bucket fills up by starting with the validated bucket and moving to the done bucket until any more work cannot be started. The only way to begin work with new features is to study some of the stories that are done but haven’t been validated. That mostly requires non-engineering efforts like talking to customers, looking at split test data and so on. Eventually, everyone is used to it. Engineering might complete a big batch of work followed by a lot of testing and validation. As engineers find ways to increase productivity, they realize that if they include the validation exercise from the start, the whole team can be more productive.
What’s important to understand in the kanban principle is that teams start measuring productivity based on validated learning rather than producing new features.
7.11 Testing hypothesis at Grockit
Grockit decided to test one of their main features called lazy registration to see if it was worth the heavy investment they were making on it. In lazy registration, customers didn’t have to give details to sign up for their study programme but immediately began using the service and were asked to register only after they had a chance to experience the service’s benefit. Thus for Grockit, this was important to test one of their main assumptions that customers were willing to adopt the new way of learning only if they could witness that it was working before they registered for the service.
Due to this hypothesis, Grockit’s design needed it to manage 3 classes of students such as unregistered students, registered students in trial and students who have paid for the premium service of the product. This design needed a lot of extra work to build and maintain – the more classes of students were there, the more work was required to track them, and more marketing effort was needed to create the right incentives to attract customers to upgrade to the premium service. Grockit had followed this extra effort as lazy registrations were an industry best practice.
Eric Ries motivated the team to try a simple split test by taking 1 cohort of customers and making them immediately register based on Grockit’s marketing materials. Surprisingly, the cohort’s behaviour was the same as the lazy registrations group behaviour in terms of registration, activation and subsequent retention rate. Thus, the extra effort of lazy registration was a complete waste of time, although it was an industry best practice.
This test suggested an even more important insight rather than reducing waste – customers based their decision about Grockit on something apart from their product use. For example, the cohort of customers registering for the product before entering a study session with other students had little information about the product. On the other hand, the lazy registration group had more information as they had used it. Nevertheless, the customer behaviour between the two groups was the same. This indicated that improving Grockit’s positioning and marketing might significantly impact getting new customers rather than adding new features.
Grockit continues developing its process and has helped nearly millions of students pass their exams.
7.12 The value of the 3 A’s
Grockit demonstrates the value of the 3 A’s of metrics – actionable, accessible and auditable.
Actionable
For a report to be actionable, it must show a clear cause and effect. If not, it’s a vanity metric. The reports that Grockit used to judge their learning milestones clarified what actions would be essential to reproduce those results. However, vanity metrics fail this criterion. Vanity metrics create problems as they target the weakness of the human brain. In Eric Ries’s experience, when the numbers went up, people thought that the improvement was due to their actions due to whatever they were working on at that time. This is why it’s common to have a meeting where the marketing team thinks that the numbers went up due to their new PR or marketing campaign, and engineering thinks that the numbers went up due to the newly added features. Discovering what is going on is costly, and thus most managers proceed to do the best they can to form their own decision based on their experience and the collective intelligence in the room.
However, when the numbers go down, a different reaction is formed – now it’s no one’s fault. Therefore, actionable metrics are the solution to this problem. People can learn better through actions when the cause and impact are correctly understood.
Accessible
Too many reports are not correctly understood by the employees and managers who are supposed to use them to guide their decision-making. Sadly, most managers don’t respond to this problem by working with the data warehousing team to simplify the reports so they can understand them better. Departments also spend time learning how to use data to get what they want instead of using it as actual feedback to guide their feature actions.
There is a solution to this data misuse. Firstly, you must make the reports as simple as possible to ensure everyone understands them. The easiest way to create complete reports is using actual and concrete units. For example, nobody is sure about what a website is, but everyone knows what a person visiting a website means. This is why cohort-based reports are the main standard of learning metrics as they convert complex actions into people-based reports. Each cohort analysis mentions that among the people who used our product during this time, here’s how many that exhibited each of the behaviours we care about. The report handles people and their actions which are more valuable than piles of data points. For example, it would have been hard to see if IMVU was being successful if they had reported only on the total number of person-to-person convos. Assume they had 10,000 conversations in a period. Is that one person being extremely social, or is each of the 10,000 people trying the product once and then giving up? There’s no way to know this without a detailed report.
As the gross numbers increase, accessibility becomes more important. It’s hard to see what it means if the number of website hits reduces from 250,000 in one month to 200,000 the next month, but most people quickly understand what it means to lose 50,000 customers.
Accessibility also refers to broad access to the reports. Grockit did this well as daily their system generated a document involving the latest data for every single one of their split test experiments and other leap of faith metrics. This document was emailed to all the employees where the reports were easy to read with each experiment, and the results were explained in clear English.
Another method of making reports accessible is using a technique developed at IMVU. Rather than housing the analytics or data in a separate system, IMVU’s reporting data and infrastructure were considered part of the product and were owned by the product development team. The reports were accessible on their website to those who had an employee account. Each employee could log into the system, select from a list of all current and past experiments and see a one-page summary of the results. Eventually, those one-page summaries became the de facto standard for settling product arguments throughout the organization. When people needed proof to support something they had learned, they would bring a printout to the meeting.
Auditable
When we are informed that our pet project is a failure, we blame the messenger, the data or the manager. That is why ‘auditable’ is important, as you should ensure that the data is trustable to employees. Most of the time, the lack of credible documentation is due to neglect. Most data reporting systems are not built by the product development team, whose job is to prioritize and build product features. Managers and analysts only build them. Managers using these data reporting systems can only check to see if the reports are continuous; however, they don’t test to see if the data is consistent with reality.
The solution to this problem is by talking to customers as it’s the only way of checking if the reports include actual facts. Systems that provide this type of auditability give managers and entrepreneurs the chance to gain insights into why customers behave the way data shows. However, the reports should ensure that the mechanisms that generate the reports are not very complicated. Whenever possible, reports should be directly taken from the master data instead of the intermediate system, which reduces room for error.
Conclusion
Innovation accounting is a valuable tool that helps startups objectively prove that they are learning to develop a sustainable business. It has 3 steps:
1. Establish the baseline.
2. Tuning the engine.
3. Pivot or persevere.