Tue, Apr 16, 2024
A A A
Welcome Guest
Free Trial RSS
Get FREE trial access to our award winning publications
Opalesque Futures Intelligence

Guest Article / CTA Strategy Thoughts:

Mathematical Modeling for Quantitative Trading Strategies

Monday, May 07, 2012

By Michael S Rulle, Jr.

There is only one history in financial markets. But there are almost an infinite number of time series one can analyze. Think of all the combinations of markets, units of time (for example, one second, one minute, one hour, etc.) and periods of time within which these units reside (for example, one day, one week, one month, etc). We have characterized this framework of viewing time series' as analysis of "the distribution of distributions" and is a key component of our model building process.

“No model building method can assure success. However, the lack of a proper scientific methodology will almost certainly guarantee failure." - Michael Rulle, MSR

In a randomized log normal world, such a framework for analysis would be redundant. By mathematical definition, one could not outperform the market's risk adjusted return in the long run except by pure luck. The alpha of such models would be zero (worse, counting transaction costs). Model development would be as fruitful as attempting to make money flipping fair coins. Therefore, all developers of trading models explicitly or implicitly believe markets are not unpredictably random. This is an assumption which should cause some humility. The challenge for modelers in trying to discover patterns which repeat themselves is daunting.

No model building method can assure success. However, the lack of a proper scientific methodology will almost certainly guarantee failure. There are many hurdles model builders need to overcome. In MSR's experience, the "data mining" bias is one of the most difficult problems to solve. At its most basic level, the data mining bias is a form of self-deception that "discovers" spurious correlations in historical simulations, which are fundamentally random in nature. This is the primary reason most models fail "out of sample" in real trading. As obvious as this may seem as a general statement, in practice the elimination of the data mining bias is a very complex and detailed process.

There are an unlimited number of ways to combine historical data into formulas and regressions that perfectly fit history but which lack any predictive value. The challenge for model builders is to distinguish between that which may be predictive and that which is not. Professor David Leinweber of Caltech created one of the best examples of data mining bias in a paper known by its famous satirical "butter in Bangladesh" method of predicting stock market prices. Leinweber demonstrated how easy it is to find a meaningless correlation if one scours enough data and uses enough polynomials.

Leinweber literally regressed thousands of data series from 140 countries against the price of the S&P 500 over a 10-year period. He "discovered" that butter production in Bangladesh "explained" 75% of the return in the stock market. When he combined butter in Bangladesh with US cheese production and the sheep population in both countries he created an almost perfect fit (an R-squared of .99).

This may seem obviously absurd, but Leinweber's point is that if instead of butter in Bangladesh one had a model predicting stock prices using GDP and interest rates with an R-squared of .70, it might not seem so ridiculous. A data miner can create non-predictive meaningless models using "sensible" data just as easily as with "butter in Bangladesh".
What does MSR do to try to avoid this pitfall? One cannot avoid using historical data to "mine" for statistically significant patterns, nor should one want to. We have only one history, as multifaceted as it is. It is also unlikely that one's first attempt at a hypothesis will yield the results one desires. It is inevitable that one will use the same data multiple times in the search for a successful predictive hypothesis. In statistics this is often referred to as the multiple comparison problem. However, if one uses hypothesis testing and other techniques on models without taking into account the number of different variables or parameters that were tested, one is almost certain to fall victim to the dating mining bias. One has to account for the number of tests done on the data to arrive at meaningful statistical inferences. It is extremely difficult to build successful models without using methods which "discount" these effects. In doing so, one improves the odds that the output of one's models will not be fallacious.

The above model building prescription is neither straightforward nor mechanical, and in practice it is very difficult. Judgment is always required at every step. "Researcher bias" (i.e., the tendency of researchers to interpret data, or make judgments, toward their desired conclusion) is a risk for MSR as it is with all financial model builders. However, we try to keep this risk at the forefront of our thinking and methodology in order to minimize its likelihood.

Read David Leinweber's "Stupid Data Miner Tricks: Overfitting the S&P 500"



 
This article was published in Opalesque Futures Intelligence.
Opalesque Futures Intelligence
Opalesque Futures Intelligence
Opalesque Futures Intelligence
Today's Exclusives
Today's Other Voices
More Exclusives
Previous Opalesque Exclusives                                  
More Other Voices
Previous Other Voices                                               
Access Alternative Market Briefing


  • Top Forwarded
  • Top Tracked
  • Top Searched
  1. KKR raises $6.4bn for the largest pan-Asia infrastructure fund[more]

    Laxman Pai, Opalesque Asia: The New York-based global investment firm KKR has raised a record $6.4bn for its second Asia-focused infrastructure fund, underlining investors' continued appetite for private markets. According to a media release from the alternative assets manager, the figure top

  2. Bucking the trend, top hedge fund makes plans for a second SPAC[more]

    From Institutional Investor: SPACs aren't dead. At least not to the folks at Cormorant Asset Management. The life sciences firm, whose hedge fund topped its peers in 2023, is confident it will match the success of its first blank-check company. Last week, the life sciences and biopharma speciali

  3. Benefit Street Partners closes fifth fund on $4.7 billion[more]

    Bailey McCann, Opalesque New York: Benefit Street Partners has closed its fifth flagship direct lending vehicle, BSP Debt Fund V, with $4.7 billion of investable capital across the strategy. Benefit Street invests primarily in privately originated, floating rate, senior secured loans. The fun

  4. 4 hedge fund themes that are working in 2024[more]

    From The Street: A poor earnings report from Tesla (TSLA) has not hurt the indexes on Thursday. The decline in Tesla stock, which is losing its position in the Magnificent Seven pantheon, is more than offset by strong earnings from IBM (IBM) and ServiceNow (NOW) . In addition, the much higher-t

  5. Opalesque Exclusive: A global macro fund eyes opportunities in bonds[more]

    Bailey McCann, Opalesque New York for New Managers: Munich-based ThirdYear Capital rebounded in 2023, following a tough year for global macro. The firm's flagship ART Global Macro strategy finished the year up 1