Mon, Oct 13, 2025
A A A
Welcome Guest
Free Trial RSS
Get FREE trial access to our award winning publications
Opalesque Futures Intelligence

Guest Article / CTA Strategy Thoughts:

Mathematical Modeling for Quantitative Trading Strategies

Monday, May 07, 2012

By Michael S Rulle, Jr.

There is only one history in financial markets. But there are almost an infinite number of time series one can analyze. Think of all the combinations of markets, units of time (for example, one second, one minute, one hour, etc.) and periods of time within which these units reside (for example, one day, one week, one month, etc). We have characterized this framework of viewing time series' as analysis of "the distribution of distributions" and is a key component of our model building process.

“No model building method can assure success. However, the lack of a proper scientific methodology will almost certainly guarantee failure." - Michael Rulle, MSR

In a randomized log normal world, such a framework for analysis would be redundant. By mathematical definition, one could not outperform the market's risk adjusted return in the long run except by pure luck. The alpha of such models would be zero (worse, counting transaction costs). Model development would be as fruitful as attempting to make money flipping fair coins. Therefore, all developers of trading models explicitly or implicitly believe markets are not unpredictably random. This is an assumption which should cause some humility. The challenge for modelers in trying to discover patterns which repeat themselves is daunting.

No model building method can assure success. However, the lack of a proper scientific methodology will almost certainly guarantee failure. There are many hurdles model builders need to overcome. In MSR's experience, the "data mining" bias is one of the most difficult problems to solve. At its most basic level, the data mining bias is a form of self-deception that "discovers" spurious correlations in historical simulations, which are fundamentally random in nature. This is the primary reason most models fail "out of sample" in real trading. As obvious as this may seem as a general statement, in practice the elimination of the data mining bias is a very complex and detailed process.

There are an unlimited number of ways to combine historical data into formulas and regressions that perfectly fit history but which lack any predictive value. The challenge for model builders is to distinguish between that which may be predictive and that which is not. Professor David Leinweber of Caltech created one of the best examples of data mining bias in a paper known by its famous satirical "butter in Bangladesh" method of predicting stock market prices. Leinweber demonstrated how easy it is to find a meaningless correlation if one scours enough data and uses enough polynomials.

Leinweber literally regressed thousands of data series from 140 countries against the price of the S&P 500 over a 10-year period. He "discovered" that butter production in Bangladesh "explained" 75% of the return in the stock market. When he combined butter in Bangladesh with US cheese production and the sheep population in both countries he created an almost perfect fit (an R-squared of .99).

This may seem obviously absurd, but Leinweber's point is that if instead of butter in Bangladesh one had a model predicting stock prices using GDP and interest rates with an R-squared of .70, it might not seem so ridiculous. A data miner can create non-predictive meaningless models using "sensible" data just as easily as with "butter in Bangladesh".
What does MSR do to try to avoid this pitfall? One cannot avoid using historical data to "mine" for statistically significant patterns, nor should one want to. We have only one history, as multifaceted as it is. It is also unlikely that one's first attempt at a hypothesis will yield the results one desires. It is inevitable that one will use the same data multiple times in the search for a successful predictive hypothesis. In statistics this is often referred to as the multiple comparison problem. However, if one uses hypothesis testing and other techniques on models without taking into account the number of different variables or parameters that were tested, one is almost certain to fall victim to the dating mining bias. One has to account for the number of tests done on the data to arrive at meaningful statistical inferences. It is extremely difficult to build successful models without using methods which "discount" these effects. In doing so, one improves the odds that the output of one's models will not be fallacious.

The above model building prescription is neither straightforward nor mechanical, and in practice it is very difficult. Judgment is always required at every step. "Researcher bias" (i.e., the tendency of researchers to interpret data, or make judgments, toward their desired conclusion) is a risk for MSR as it is with all financial model builders. However, we try to keep this risk at the forefront of our thinking and methodology in order to minimize its likelihood.

Read David Leinweber's "Stupid Data Miner Tricks: Overfitting the S&P 500"



 
This article was published in Opalesque Futures Intelligence.
Opalesque Futures Intelligence
Opalesque Futures Intelligence
Opalesque Futures Intelligence
Today's Exclusives
Today's Other Voices
More Exclusives
Previous Opalesque Exclusives                                  
More Other Voices
Previous Other Voices                                               
Access Alternative Market Briefing


  • Top Forwarded
  • Top Tracked
  • Top Searched
  1. Global fintech investment slumps to seven-year low of $95.6bn[more]

    Laxman Pai, Opalesque Asia: Global fintech investment plummeted to $95.6 billion across 4,639 deals in 2024, marking its lowest level since 2017, as investors grappled with persistent macroeconomic challenges and geopolitical tensions, revealed a study. According to the Pulse of Fintech H2'

  2. Opalesque Exclusive: Private capital deal value climbed 19% in 2024[more]

    Bailey McCann, Opalesque New York: Private capital deal value climbed 19% in 2024, according to the latest data from the Global Private Capital Association. Growth was driven by big-ticket investments across Southeast Asia, Latin America and Central & Eastern Europe (CEE). Investor confidence

  3. Opalesque Roundup: Citco: 77% of hedge funds achieved positive returns in January 2025: hedge fund news[more]

    In the week ending February 21st, 2025, a report revealed that hedge funds enjoyed one of their best opening months this decade in January, as Equity and Multi-Strategy funds posted strong returns. Funds administered by the Citco group of companies (Citco) delivered a weighted average return of 4%,

  4. Opalesque exclusive: Permuto's new equity unbundling product to change investment model[more]

    Opalesque Geneva for New Managers: Here is a different way of owning stocks coming to you soon: the option of holding just the dividend portion of a stock, independent of its price movements. Or capturing the stock&

  5. Opalesque Exclusive: Hedge funds outperform mutual funds in managing extreme risk contagion - key insights for investors[more]

    Matthias Knab, Opalesque for New Managers: Hedge funds and mutual funds are among the most prominent vehicles for investors seeking growth and diversification. However, a critical question persists: which fund ty