Prediction engines provide new ways to forecast future

How much would you pay to be able to predict the future? Quite a bit, if you could predict tomorrow’s lottery number. What if you could predict with less precision, enough to let you know something good – or bad – was queued up just around the next corner? How much would it be worth to have precious time to prepare?

This tantalising possibility could come from a mouthful known as a ‘universal time-series transformer’. It applies cutting-edge developments in artificial intelligence to the physical world, promising a revolution in both how we think about processes, and how we build our systems to manage those processes.
At scale, universal time-series transformers might make our current mania for all-things-chatbot feel like a meagre entrée before a much more satisfying main.

To grasp the leap represented by universal time-series transformers we should first define what we mean by a ‘time series’.

A set of temperature readings over the course of a day provides a familiar example of a time series: the temperature at 7 am might be measured at 17 degrees, at 10 am 22 degrees, at 1 pm 28 degrees, at 4 pm 29 degrees, and at 7 pm 23 degrees. Five data points (17°, 22°, 28°, 29°, 23°) each with a unique timestamp (7 AM, 10 AM, 1 PM, 4 PM, 7 PM).

Nothing remains static as the universe changes over time, and for that reason nearly every physical process can be represented in time series, snapshots of the fundamental dynamism of existence. A time series marries the phenomenal and the temporal, making process visible – and mathematically manipulable.

Humans have always been students of time series. Our pre-Homo sapiens forebears observed the seasons, able to predict that spring would follow winter, and possibly even aware of the ever-evolving phases of the Moon.

Homo sapiens learned to trace and predict the paths of the planets across the skies long before we developed writing or mathematics. With them, we could advance our predictive capacities to encompass eclipses, even (as highly secret and mystical knowledge) the Precession of the Equinoxes.

All our astronomy – from the pre-human to the modern James Webb Space Telescope – draws upon time series. And while these roots are in astronomy, we know today that time series allow us to make predictions about any observable process. They’re universal.

Max Mergenthaler-Canseco from San Francisco startup Nixtla. Credit: Courtesy of Max Mergenthaler-Canseco.

“We always tell this story,” Max Mergenthaler-Canseco begins. “From the smallest hot dog stand, to the biggest bank in China. They need to forecast how many sausages, bonds, ingredients am I going to use in the next week? How many should I buy if it rains?”

The CEO co-founder of San Francisco startup Nixtla, Mergenthaler-Canseco proceeds to name-check regression analysis, the most common classical technique for making time-series predictions. It’s the method his startup wants to disrupt.

“Most hot dog stand owners are not thinking of doing regression analysis to predict their hot dog consumption, but imagine they ask, in theory, ‘Hey Siri, how many hot dogs should I buy?’”

Regression analysis uses statistical modelling of historic data to generate the next value in a time series. For example, “This is how many hot dogs I’ve sold. How many hot dogs will I sell tomorrow?” From sales data collected over many months – alongside precipitation data for the same period – regression analysis provides a reasonably accurate prediction of how many snags will be sold in a downpour.

Yet change any of those parameters – make a prediction for the number of cans of soft drink that will be sold by that hot dog vendor, against the day’s high temperature – and you need an entirely different training data set, again collected over many months. Regression analysis works well enough in specific circumstances – but time-series data collected for one set of parameters has no predictive value for another set of parameters.

Nixtla – the team of Mergenthaler-Canseco, Azul Garza and Cristian Challu – reckons it’s found a route around this specificity, using a transformer. Developed at Google in 2017, the transformer has rapidly become the most important bit of software since the birth of the World Wide Web, forming the core of all of our large language models (LLMs), such as GPT-4 (powering ChatGPT and Microsoft’s Copilot), LLaMA (Meta AI), Google’s Gemini, and so on.

At essence, a transformer takes a string of input – generally, that’s a bit of English language text known as a prompt – then calculates the statistically most likely completion for that prompt. Breaking the prompt into roughly syllable-length tokens, the transformer pushes those tokens through a massive set of weightings, using those weights to determine the next most likely tokens to follow the given prompt.

For example, providing the prompt, “To be or not to be?” should generate the output “That is the question.” Why? Because the transformer’s weightings have been trained on trillions of words, scoured from every available corner of the Internet, including many copies of Shakepeare’s Hamlet, translations of Hamlet, commentaries on Hamlet, performances of Hamlet, parodies of Hamlet, etc. Every one of those instances adds to the model’s weightings, effectively producing a path that points directly to one output.

But there’s another way to think about this – and here we come to the core of the innovation expressed in 2023’s ‘TimeGPT-1’ paper, co-authored by Garza, Challu and Merganthaler-Canseco.

All of the text used to train LLMs represents sequential data; one letter, one word follows another. Some arrangements of letters and words make sense – that is, they’re probable – while other arrangements of letters and words make no sense, and are therefore highly improbable.

There’s a similarity in form between this flow of language and flows observed in the physical world. Given the appropriate weightings, a transformer should be able to generate the next value in a time series using exactly the same mechanism it employs to generate the next most likely word in a response to a prompt. They’re identical processes.

The vast array of training data fed into LLMs (a topic of much controversy and more than a few lawsuits) means that nearly every prompt put to a chatbot will produce a reasonable enough output. A well-fed large LLM can generate a completion to any prompt put to it, without being specifically trained in any topic. Having been trained on every topic its creators could find, it draws upon all of that training as it generates outputs. LLMs are universal text generators.

Nixtla harnessed that same capability to generate time-series predictions. Rather than needing specific time-series data for every conceivable combination of parameters (precipitation versus cans of soft drink sold), TimeGPT-1 described a time-series transformer with a universal capacity to generate “good enough” predictions without highly specific training data.

TimeGPT-1 drew its predictions from a massive set of time-series data, snapshots of a range of physically observed processes. This begs the question: which data sets did the paper’s authors use when gathering up their hundred billion points of time-series training data?

“We realised that the diversity of data was very, very important,” Garza notes, “Because we were designing a model to work on basically every use case.”

Sausage sales, bond rates, electricity consumption, the universal time-series transformer must be fed a highly varied set of sources to reliably generate its predictions across a wide set of input time series.

“The exact list of the data set we use for training is secret,” Mergenthaler-Canseco replies. It’s part of Nixlta’s secret sauce, and the foundation for their startup’s product – a software-as-a-service tool that allows pretty much anyone who can write a bit of code to access their TimeGPT-1 universal time-series transformer. Bring your own time series – any time series – plug it in, and get a “good enough” prediction for the future.

Doyen Sahoo of Salesforce AI Research. Credit: Courtesy of Salesforce.

Two scientists at Salesforce AI Research – Caiming Xiong and Doyen Sahoo – express less reticence when I ask where they got the data to train Moirai, their own universal time-series transformer. It’s the research product of their 2024 paper, ‘Unified Training of Universal Time Series Forecasting Transformers’. (Moirai gets its name from Ancient Greek term for The Fates, the triple goddesses who foresaw the future.)

“We’re a big cloud company,” Xiong acknowledges. “We have a lot of AI infrastructure we need to manage. The most common data set we see in this infrastructure is time series: CPU utilisation, memory usage, and many other things like that.”Salesforce’s vast hardware infrastructure generates an inconceivable wealth of time-series data. “We collected a lot of publicly available operations data from the cloud to build a model that could forecast everything. That went well enough. We thought: ok, now we can collect a larger amount of data to train a massive model.”

Twenty-seven billion time-series data points and a lot of training runs later, they had their first version of Moirai.

Caiming Xiong of Salesforce AI Research. Credit: Courtesy of Salesforce.

Although Nixtla sees value in keeping their data set secret, Salesforce AI Research’s work reveals something that should have been obvious. Our world – crowded with sensors, each generating a massive flow of time-series data – has more than enough training data to satisfy anyone wanting to build their own universal time-series transformers.

Unlike LLMs, universal time-series transformers will not go wanting for lack of high-quality data. There’s just too much of it around. That means they won’t be artificially constrained by copyright or access to
training data.

Universal time-series transformers should soon be cheap and plentiful, a point Sahoo and Xiong made by publicly releasing Moirai as open-source software. Anyone can download their data set at https://github.com/SalesforceAIResearch/uni2ts along with a set of ‘notebooks’ – programs that can be easily modified to suit a range test cases – and put universal time-series transformers to the test.

What are these models good for? Will we see hot dog vendors putting them to work before they place their order for the next day?

“Traditionally, a data scientist would create a single model and make it work for a time series. That means you have to retrain the model every day – which is super expensive. In contrast, you can have this universal forecaster, which you don’t have to train again, and which is very easy to integrate,” explains Sahoo.

Xiong points to an example we’re all familiar with, “Every month I open my bank account, and the bank will be able to say, ‘I predict you will spend how much money in the next month.’ So I can use that information on how I can save and spend.”

Sahoo and Doyen envisage a world where universal time-series transformers have been embedded in a very broad range of software to help predict – and plan for the future.

Their usefulness quickly became clear to Nixtla, as its first clients put universal time-series transformers to work. Those client operations naturally generate time-series data, and plugging that data into TimeGPT-1 allowed them to quickly detect whether things are going to plan – that is, generating results within the expected range – or going awry.

This ability to detect anomalies looks like a massive win, one that will drive universal time-series transformers into nearly every industrial and logistics process.

Universal time-series transformers provide an essential form of feedback to keep systems on track – and sounding the alarm when they look ready to leap their guardrails.

Although now dominant, the transformer has challengers.

“Other architectures – state space models, diffusers and so on – will work as well, if not better than transformers, once researchers apply the right tweaks,” predicts Dr Stephen Gould of the Australian National University’s School of Computing. “One thing I’m certain of is a future where these models will get better.”

Better models mean better predictions, and better predictions mean smarter, more effective systems. To get there we’ll need new kinds of human intelligence.

“We’ll have a new role coming up in the next few years,” Salesforce’s Doyen reckons. “Working within organisations, helping them get the most from universal time-series transformers: time-series scientist.”

Add a comment

Leave a Reply

Your email address will not be published. Required fields are marked *