There are a few concerns that pop up when it comes to artificial intelligence (AI) that technology is constantly trying to address and resolve. One of those key problems is the long tail problem: where AI datasets have infinite possibilities, and therefore create problems for an AI solution to be able to solve all of these possible scenarios.
A large majority of AI systems lean heavily on supervised learning methods, where a neural network trains on identified data. However, the challenge with this method is how the model performs when it’s dealing with a dataset from the fringe or a new dataset that is not already adequately represented in the original training dataset. The big challenge arises when you try to achieve optimal performance on these rare edge cases.
Getting Caught Up in the Weeds
And the long-tail problem here is what is tripping up many data scientists today. The temptation is to focus too heavily on the minute edge cases and optimize for these, rather than focusing on the core AI solution. In some instances, there are simply too many possibilities and edge cases to account for; by focusing on all of the minute scenarios, you can miss out on actually solving the problem. By focusing on the edge cases, the product may face overall poorer performance and an imbalanced ROI.
For example, programming a car for autonomous driving lends itself to immeasurable possibilities and scenarios – some that we’ve never even considered before in real life. Getting caught up in the weeds of possible scenarios to put in the training set could lead to a program that never gets launched.
The Realistic AI Tech Company
The very best approach any tech company can have is to be realistic about how the long-tail problem affects things. Perfectionism is not welcome here and ultimately will be your downfall if you are looking to get something out the door. Of course, we all realize that we want a high-quality solution, but when dealing with the long-tail problem of AI, it’s best to be realistic, knowing that you will be ever-iterating. Get comfortable with that fact now, and you’ll be better for it down the road.
The best approach is to build out your training dataset to be comprehensive. However, ensure that you have sufficient time to jump over to the testing and training. Here you can identify the major holes in the data and quickly make updates to accommodate for those holes. This iterative approach is one that realizes the solution is never complete, but instead always learning and always evolving to be better and smarter.
A Challenge in Project Management
The long-tail problem also makes managing resources and making decisions on optimization and approach a big challenge for AI-based technologies. Since your solution will never be perfect, it’s a delicate balance to make decisions on how you allocate resources, time, and attention to ensure you have a reliable solution, but also a solution that can then continue to evolve.
In the Real World…
The long-tail problem of AI is no stranger to those of us in this space, including many of the companies in the Sentiero portfolio. For example, Geminus.AI has a physics-first modeling engine helps create complex digital twins based on the laws of physics rather than edge cases, and other AI-based tech companies thrive. They are building solutions to answer queries and improve the model as they go; they’re solving problems others have not even considered solving. Of course, they put out a solid solution from the start, but in each case, their solutions are ever-evolving and become better with each update.
We love seeing how real businesses are solving these complex problems, all while creating solutions that are changing the world!