Classifying Steps with Machine Learning

For a non-technical introduction to the concept of Machine Learning, please read Making Step Counting Smarter

When we first began to explore the idea of building a step classifier, we knew we would be constrained to a very limited population of individuals (Jawbone employees) available to us for early development and testing.  

It seemed certain that the development of the classifier would be very iterative in that, as we tested larger and more varied sets of individuals and behaviors, we would undoubtedly find issues that we needed to quickly correct.  So we would need a technical approach that was suited to rapid updates and that those updates would need to be essentially risk free.  We could not afford the risk and development time of actually writing new code as we iterated.  In short, we needed a step classifier that learned.

We settled on the notion of a step classifier as a mathematical model – not a collection of code in some programming language.  It could be considered data (not code) and, as such, it would be possible to simply replace one model (a block of data) with another model (a different block of data) without incurring a huge coding and Quality Assurance penalty.  We would just need a single, unchanging block of code able to interpret those changing models.

The mechanism that we chose to construct these mathematical models is machine learning.   So, how does this work?

In order to train the step classifier we provide the learning algorithm with enormous numbers of ‘labeled examples’.  A single example is simply a short snippet of accelerometer data, with a label indicating whether that snippet corresponds to one, two, or three steps.  Features of the accelerometer stream are then defined to describe each snippet.  

A feature is simply a mathematical function of some kind – how many peaks in the snippet, the distance between the highest peak and the lowest trough, etc.  Once the features are defined, each snippet becomes a point in the space of features.  Figure 1 illustrates how a variety of such snippets might look when visualized in a space of two such features (the x and y dimensions of the plot).  Note how the points corresponding to different step counts form regions in the space – the challenge is for the machine learning algorithm to learn how to form the boundaries that separate those regions.



Figure 1.


In order to train the classifier we must gather and label enormous numbers of snippets (henceforth, we’ll call them examples).  We first define a rich set of behaviors (walking on dirt in sneakers, walking on grass barefoot, walking while holding a cell phone to your ear, running a 6 minute mile, etc.) as well as a collection of demographic types (male, female, short, tall, thin, heavy, etc.).  For these combinations we identify individuals of each demographic group and ask them to perform the selected activity while we collect the raw accelerometer data. We call those collections “recordings”.  During each recording, a human observer counts the number of steps taken by the participant. These true step counts are used as the labels for the examples contained within the recording.  When sufficient numbers of examples are collected, the machine learning algorithm extracts the features and constructs a mathematical model that describes the boundaries separating the labeled points.

Figure 2 illustrates what the boundaries separating the regions might look like.  Note that the boundaries are not perfect – errors are made in that points with a particular label (e.g. “2 steps”) may fall inside the regions defined for other points (e.g. “1 step”).



Figure 2.


Once the mathematical model (we’ll subsequently call it the classifier) is learned, it can be loaded on the UP band and used to make live predictions.  As you walk (or run) the accelerometer stream is sliced up into snippets and features are derived from those snippets.  Again, each snippet becomes a point in the space of features – but this time the label is unknown and must be predicted.  A predicted label is assigned to the point based upon the region where that point “lands” in the space of features.  Figure 3 illustrates a single, unlabeled point (grey dot) landing in the region populated by points labeled as corresponding to “2 steps” and, as such, the classifier predicts that the unknown snippet of accelerometer data corresponded to the user walking two steps.  This process runs continuously and, in the process, counts your daily steps.



Figure 3.


As mentioned earlier, the boundaries defined in the learned model are not perfect.  Some unlabeled snippets will land in the wrong regions and, as a result, step count errors will be made.   For example, during the course of the development of the classifier we have launched with UP2 and UP3 we encountered a number of situations where errors were made.  Our VP of Software noticed one day that his steps were undercounted as he walked back to his desk carefully holding a full cup of coffee – snippets were landing in the wrong classifier region. In order to address this problem we needed to adjust the region boundaries and, for this, we needed to provide the machine learning algorithm with additional examples – examples of the problematic behavior to be precise.  So we asked volunteers to walk around the office holding cups full of coffee as we collected their data.  We then folded that newly collected data into the corpus of data we had already collected and let the machine learning algorithm learn from these coffee cup examples and deliver a new model.  And, when we tested it, the “coffee cup problem” had been eliminated without the need to write or test new code.  Figure 4 illustrates how the acquisition of new data results in adjusted region boundaries.



Figure 4.


There have been many instances where we have observed “pockets” of behavior that the step classifier did not handle properly. At one time we had trouble with males over 6 feet 4 inches who were walking slowly.  Another time we found we were undercounting the steps of females who were small in stature.  In each case the basic methodology was the same:

  1. Identify the behavior
  2. Collect recordings of that behavior
  3. Carve the recordings up into snippets of accelerometer data and label them
  4. Add the new examples to the existing corpus of labeled examples
  5. Let the machine learning algorithm learn from the new examples and construct revised region boundaries (a new model)

We are excited and encouraged by the success of this approach because it means that our classifier will always get better and will be able to learn to handle new behaviors as we encounter them.  Shortly after launch, for example, we discovered from one of our users that there was a step count problem associated with a very particular kind of elliptical machine.  So, as you might imagine, we are gathering elliptical machine examples as we write this.

In effect, our Jawbone community really becomes an active part of this process.  As we listen to the community we learn where we have problems to solve and, in turn, our machine learning algorithm learns how to solve those problems.

About The Author

Stuart Crawford

Stuart is the VP of Algorithms at Jawbone. Stuart leads the data analytics team at Jawbone where he oversees development of all data products and algorithms that work with products like the UP and Jambox. Prior to joining Jawbone, he was the VP of Core Research at FICO where he was responsible for all technical aspects of the development of the classic FICO score as well as a variety of other scoring products. He was instrumental in accelerating the rate at which such scoring products can be developed, and developing new analytic techniques that can increase the predictive power of the scores themselves. Stuart is a veteran in the field of Data Science with a career spanning close to 30 years and holds 11 U.S. patents. He received a PhD from Stanford under the supervision of Prof. Jerry Friedman, one of the most influential scholars in statistical learning.