Labelbox•January 21, 2020
Asking good questions isn’t the first step to building enterprise ML, it’s step 0
When email first came on the scene, it was revolutionary. People were excited to try a faster method of communicating with friends and business contacts. But of course, someone quickly found a way to ruin it.
Spam (unwanted bulk emails, not the canned meat product) started popping up in huge numbers, slowing down network speeds and people’s productivity. If we wanted to improve internet speeds, we’d have to identify spam before it clogged up inboxes.
Enter machine learning. Trained machine learning models are able to pinpoint spam emails and stop them before they reach email users. These models can also prevent phishing scams from affecting users. This means that most spam that you didn’t sign up for goes directly to your spam folder. Thank you machine learning.
Machine Learning (a subset of AI), can help us solve many problems. Especially those that are too labor-intensive to answer in a cost-effective way with human labor. Why is machine learning (ML) a viable way of answering questions?
AI and Machine Learning Aren’t Science Fiction Any Longer
AI used to be something out of science fiction. Star Trek: The Next Generation had an android crew member named Data who could think and react like a human. But for a while, we didn’t have the tech or experience to make AI work in the real world.
That’s all changed. Machine learning has advanced to the point that there’s a ready-made ML model out there to suit your project. You just have to pick the right one for your needs. We also have the technology, experience, skilled workers, and data sets to make these generic ML models useful for all types of projects.
There’s another benefit to improvements in machine learning models. More accessible ML models make machine learning easier and cheaper to use. Because you don’t necessarily have to build a model from scratch, you save resources.
Another reason that machine learning has become so useful is that computing itself has gotten much, much more affordable. The decreasing cost of computing has made ideas like deep neural nets (which is the basis of machine learning) viable. But, besides an exercise in what's possible, is there good reason to use machine learning?
In Star Trek, Data was a highly useful crew member because, as an android, he never tired out and didn’t make mistakes due to fatigue or mood. Machine learning models are similar; they can do practical tasks for a wide variety of industries without slowing down or tiring.
From determining whether images contain a specific animal for an environmental organization to identifying a patient’s risk of developing a serious medical condition, ML models can provide huge benefits, tirelessly.
These tasks aren’t just helpful. They produce fantastic benefits for businesses that use machine learning. In a 2018 report, 63 percent of companies who used AI noted a revenue increase in the departments that used AI. Plus, 44 percent reported that AI cut their costs.
AI and ML used to be a dream of science fiction. Now, it’s a viable way of solving the challenges we face in almost any field. However, if you want to leverage machine learning algorithms, you have to start with the right question.
Asking Good Questions Is About Understanding Machine Learning Limitations
Machine learning algorithms can do amazing things, but they aren’t capable of answering every question we have. Unlike in Hitchhiker's Guide to the Galaxy, it certainly can’t tell us the answer to the ultimate question (it's 42) or answer subjective questions like what the meaning of life is. To understand how to use machine learning, you must understand its limitations.
Machine learning algorithms are great at identifying things in images and distinguishing between different objects (such as whether an item is a specific type of car or something else.) But, some questions take too much information to answer. What would your machine learning algorithm have to analyze to figure out if a cow is healthy? Let’s look at just a few things it may have to analyze.
- Weight: is the cow in the right weight range?
- Appearance: does the cow’s fur look thick and shiny? Are its eyes clear? Is it clean?
- Behavior: Is the cow eating enough food? Is it drinking water? Is it moving enough, or is it lying down for extended periods?
As we can see, there are many factors involved in a cow’s health and focusing on just one can be deceiving. This makes it hard to flag sick animals accurately in your machine learning model.
What if you could make your question more specific? For example, how many times does the cow eat in these images? That’s something your ML model could work with. All your model would need to do is track the cow and track the feed, operations that are pretty easy for ML models to do. While eating is just one factor in a cow's health, it could be used in conjunction with other ML models to flag warning signs and aid human analysts.
Ensuring that machine learning can answer your question is crucial. Your question affects every part of your ML model, from your data sets to your own mindset. Changing your question mid-project isn’t a minor inconvenience, and it could mean you have to scrap everything you’ve done so far.
How can you craft a question that’s not only answerable but also worth building an ML model for? A firm grasp on where ML models excel can help you unlock the right question. Let’s look at the two areas where ML excels.
1. Machine Learning Excels at Supporting Human Employees
Often highly skilled staff members get stuck babysitting repetitive tasks. These can take up too much valuable time, whether or not they are part of their primary job. These repetitive duties may require some specialized knowledge, but not enough to warrant wasting your best employees on it. For instance, bank employees often have to manually input information from financial documents, such as invoices.
Taking advantage of machine learning reduces the amount of repetitive work your skilled staff has to do and frees them up for the job they were hired for. In fact, using ML will make your process more consistent and may even make it more accurate.
For example, in the medical field, ML models can look through patients’ scans and x-rays to detect problems. Then it flags those results so that a doctor can look through them to confirm the findings.
By combining machine learning algorithms with a doctor, the process not only eliminates false positives. It also reduces the mental fatigue a doctor faces when they look through test results day after day, which can lead to mistakes. Pairing ML algorithms with a doctor creates final results that are more accurate than if either worked by themselves.
Using machine learning to assist your staff with routine tasks makes your business more efficient. This brings us to the second area machine learning excels in. ML models can do work that is valuable, but too expensive to hire humans for.
2. ML Models Can Do Work that Is Too Expensive for Human Workers
Sometimes work is worthwhile, but the cost of hiring people to do the job is too high. This is especially true if the work has to be done on a large scale. Machine learning can help organizations solve this conundrum.
For instance, an insurance company might find it helpful to analyze satellite pictures of houses to discover possible fire hazards. However, going through and cataloging millions of houses manually would cause the cost to outweigh the potential reward.
Setting up machine learning algorithms can reduce the price tag enough to make the task valuable again. ML also speeds things up, which is great for dealing with images at scale. Taking advantage of ML models gives your business the chance to make investments that wouldn’t be fiscally responsible otherwise.
Basing your machine learning question on tasks that aid skilled human workers or that couldn’t be done profitably by human workers is a fantastic way to ensure your ML algorithm is valuable. What does it take to create your own ML algorithm?
Building Your ML: It’s All About the Data
Machine learning needs four things to work: the question, the inputs (a raw data set), outputs (a processed data set), and the model itself. Once you’ve figured out your question, you’ll be able to decide the parameters for the other three areas.
Choose the right model for what you need. There are plenty of open-source ML models you can pick and choose from. Even though these models are typically generalized, they will usually work for your project without needing any changes, which saves valuable engineering time.
Customize your data sets. Keeping your data set as close as possible to the real-world conditions you’ll be analyzing will reduce errors. Even errors that seem small and unimportant could make your final version useless.
Let's say an environmental organization wanted to create an ML algorithm that would identify cats from photos, whether they were domestic or wild. Consumers would be able to send in photos and learn what kinds of cats were living in the area.
Unfortunately, they run into a hiccup. The organization doesn't include enough images of cats in its data set. The model doesn't have enough data to correctly determine what is or isn't a cat. Animals like hairless chihuahuas start being identified as hairless cats like the sphinx, leading to false positives. Because the organization didn't include enough images to simulate real world conditions, the ML model didn't work.
This brings us to the second problem that can come up with data sets. You can control what you feed into your model, but not what it learns from it. While annotating your data set can help, certain types of annotations are better at training your ML algorithm than others.
One team learned this the hard way when they tried to train a neural network to distinguish between types of canines. Instead of learning the difference between huskies and wolves, the ML had determined that animals against snow or a white background were wolves, while animals that were indoors were huskies.
Inputting a data set that didn’t match reality skewed the results when it came to real-world applications. If you can’t create a data set that reflects your conditions, then you can take advantage of annotations and updates to keep your ML algorithm on track.
Let’s look into the three main types of annotations you can use to teach your machine learning algorithm.
The Right Annotations Make the Best Teachers
There are three types of annotations you can use to label your images: whole image classification, bounding boxes, and segmentation masks. Each has drawbacks and benefits, some increase labor and accuracy, while others give you increased speed with a decrease in accuracy. The right choice depends on your needs.
- Whole image classification is when you simply annotate what is in the image without specifying where it is. It is the cheapest and easiest way to annotate pictures, but it also leaves a lot of room for error.
- Bounding boxes are when you label the general perimeter of an object (drawing a box around it). This method is slightly more work than whole image classification, but it’s much more precise for training ML algorithms. There is still a risk of error since additional objects still may be captured within the perimeter.
- Segmentation masks involve annotating each pixel in an image. This is the most time-consuming and costly method. However, segmentation masks are by far the most accurate way to train ML algorithms, since you’re describing each pixel within the image.
The type of annotation you choose depends on how good your data set is and what you’re training your ML algorithm to do. Once you begin testing your ML model, you may discover blind spots and mistakes in certain areas. Identifying where your ML model is having trouble and adjusting that part of your data set is one of the core pieces of training an accurate model.
Machine learning can help you turn large data sets into productive tools. As long as you have the right question in mind, use good data sets, and the right annotation schemes, you can build an ML model to solve your problem.
Machine Learning Can Be a Force for Good
Machine learning has amazing capabilities. From blocking spam in emails to helping doctors in hospitals diagnose patients, ML systems are a strong force for good. By answering your important questions, machine learning can help both you and others.
The right question affects every part machine learning, from what model you choose to how you create your data set. Ensuring you ask questions ML algorithms can answer prevents wasted effort and remembering where ML excels ensures that your question is valuable to your business.
Once you’ve discovered the right question, you need to compile and label a data set around it. At Labelbox, we provide teams with all the training data tools they need to make machine learning work for them. Are you ready to make machine learning your competitive advantage? Talk to one of our experts today.