Hi guys, I'm Thomas. I'm a student studying applied physics at Columbia University and I have a passion for quantitative finance. And this is the first real episode in a series I'm doing called the quant process where I'm explaining to retail traders the endto-end process of creating a quantitative trading strategy and especially testing it rigorously before deployment.
My goal is to not only explain a more institutional framework for thinking through the strategy generation, testing and deployment process, but to also explain the tools and methodologies that should be used therein. I think regardless of whether you're entering quantitative finance with no idea for a strategy that you'd like to create or with an idea of a strategy you'd like to build, but you want to build it programmatically, the appropriate place to start is with feature engineering. Before we jump right into things, I'll give you a highle overview of what we're about to do.
I'm going to explain what a feature is, which at a very basic level is just some raw data collapsed to be just some idea that you care about, some feature that you care about. We'll get more into that in a second. I'll give a very intuitive realworld explanation of what a feature is which will give you some intuition behind why feature engineering is so important.
Then I'll explain a trading example of feature engineering. Then I'll explain feature types. So feature types by purpose or by value which will give you some intuition regarding how to actually create features for yourself or transform ideas that you are thinking of into features.
Then I'll go through exactly how to build a feature, the exact logic tree for building a feature. Then we'll go through one or two examples so that you can understand how to build a feature yourself. Okay, let's start with what is a feature.
This is a word that is kind of high context and people use it in machine learning and in academia a lot, but if you don't watch these videos or read these things, then it kind of sounds like a weird word. I'll explain it like this. So, let's zoom into a trading context for a second and think of what the goal is with trading.
Generally speaking, outside of arbitrage, the goal with trading is to use current or past information to make predictions about the future. I guess in a more technical sense, you could say that we want to find a relationship or some array of relationships that tells us that one distribution in the realm of future possibilities differs from another under specific conditions. Okay, so how do we find relationships between what's going on now and what might happen in the future?
The first thing you need to do is get some data. So let's just imagine I gather up a bunch of candlestick data. So open, high, low, close price data.
So four pieces of information per candle plus a time stamp, right? And then I try to build a model using that raw data to predict future data of the same price. That's going to be a terrible model if we just use the raw data and the target or or the outcome that we're looking to predict is the open, low, high, and close value of the next candle, right?
And if you're a trader who trades charts, you might immediately object to that and say, "What do you mean I can't build a good model off of open, high, low, close price data from candles like I see on my chart. I actually plot trend lines and then I look at the daily volatility in the previous days range and then I have this model that looks at uh the RSI and then also I look at the relative volume of the day. Those are features.
Those are all transforms of the data on your chart, the open, high, low, close data except for the volume. So your trend line is a transform of that data. Your volatility indicator is a transform of that data.
The previous day's average true range is a transform of that data. Those are all features and they tell you something that you think is meaningful to the performance of your model. Feature engineering is about transforming raw data.
Whatever raw data you bring in into meaningful features. So things that you actually care about that you think will influence the performance of your model or that you think have some predictive power over a given financial variable. It's about taking an unclear relationship between raw data and outcomes and making it clear by transforming that raw data into meaningful features that actually are related to the outcomes in an extractable way.
I think it's very hard to talk about these things and to understand them for the first time without using examples. So let me give you just a very intuitive real world example before we jump into a trading one. Imagine I want to predict flight costs like airline flight costs and I have some raw data.
So, the date of the flight, the date that the flight is departing, the date that the ticket is purchased, the origin temperature, the temperature of the place where you're taking off, and the temperature of the place where you're landing. If I gave you a human this data, and said that I want you to predict ticket prices, you might intuitively yourself say something like, "Oh, well, I think probably when the flight date is very close to the purchase date, meaning the person purchased the ticket right before the plane took off, it's probably more expensive. And when the origin temperature is much lower than the destination temperature, the flight is probably more expensive because the person's flying from uh somewhere cold to somewhere warm.
And probably if the flight date is around a holiday, the flight is going to be more expensive as well. You just performed feature engineering mentally. And if I gave you some training data like this data here and the outcomes that go along with it, you'd start performing multi-dimensional pattern recognition in your head just like a machine learning model.
But machines don't work that way where they can do that intermediary step of feature engineering themselves. So you have these ideas of things that you think matter but you don't know exactly how they matter and you can't mentally construct them into an optimal model to predict outcomes in your head. A machine can do that but you just have to do the step of feature engineering transforming this raw data into meaningful features like these.
Is this a holiday window? So, is this flight date within a holiday window around Christmas or around winter break or whatever it is? The temperature difference, the days until the flight as well.
That's a feature that we think is meaningful. And sometimes we'll be wrong. Obviously, we'll create features that have no predictive power over financial variables or over ticket price in this particular case.
But we engineer features that we think are meaningful. Okay, let's move on to a trading example. Imagine we have some raw data.
So, time, open, high, low, close data, candlestick data. And in an uptrend, we want to be able to say whether a 1-hour candle will constitute a reversal, a continuation, or nothing. This is a 1-hour chart.
Say we were at this point in time, and I asked you to predict whether this candle would constitute a continuation, or a reversal of this move upwards. Would you just consider the open, high, low, and close values of this candle right here to answer that question? Probably not.
Intuitively, you're probably thinking something like, "Oh, this candle has a very big wick, and it looks like we're trending a little bit sideways here, and the average true range over the last few bars is very large. " Looking at this chart, your brain is engineering all of these features and trying to understand the relationship between them and the future outcome, but obviously, it's not optimal like a computer would be. So, we come over here and we engineer features that we think are meaningful.
So, the upper wick percentage, how large is the upper wick of the candle relative to the body of the candle? the average true range over the last 14 bars. So, how large are these bars ranging on average?
Trendiness. So, is this a very strong trend that we're in or is it kind of choppy? And the volatility regime maybe you'd consider as well.
The idea of feature engineering is to take these ideas that we just saw on the chart with our eyes where we said, "Oh, this looks like the wick is a little bit big and it's kind of trending sideways a little bit. " We describe these qualitatively. And the goal of feature engineering is to turn these things quantitative.
to turn these handwaving ideas like, "Oh, this wick is pretty big compared to the body into real numbers that we can analyze. This allows us to move from super naive and terrible relationship exploration using our eyes and our human brains to analyzing relationships quantitatively using a computer. And then we'll also move on to using machine learning to do that multi-dimensional pattern recognition.
How can different features in synthesis with each other forecast outcomes? and how do we build the optimal model off of that multi-dimensional relationship. So you'll notice that of these features we engineered, some of them are numbers and then we also have these flags, these sort of categories or buckets over here for the volatility regime.
You'll also notice that we used buckets for the labels which are also technically a feature. So let's get a sense of the universe of features, the different feature types broken down by purpose, what they're actually used for, and by value, the type of value that they hold. Let's start by breaking them down by value.
So we'll start with continuous features. This is the world of things like average true range or realized volatility or RSI for example, distance to a simple moving average, distance to a relevant level, uh VWOP, you name it. If I think the distance between price and the previous day's low has some sort of predictive value in the context of my model, that'll be a feature.
But that would likely be a feature that we normalize. And we'll talk about this a little bit more probably in the next video or later in this one. But when you have features that depend for example on price like that, you don't want the scale of them to dominate the model.
When we talk about the difference between price and the previous days low having some sort of predictive value over another financial variable, do we really care about the raw dollar difference between these two levels? Probably not. Because for example, if we zoom back 10 years ago on the S&P, that dollar difference is going to be much smaller on average than if we come to present day when the price of the S&P is much larger.
So that dollar difference is going to be greater. Has the relationship really changed? Probably not.
So more likely we'd want to normalize that to percentages or something like that because it's the relative difference that really matters for that feature. And if we don't account for the change in scale, then different feature whatever bad things can happen. We'll talk about that more in a little bit.
So let's move on to binary or boolean features. So these are true or false or zero or one features. So these are things like uh flags or events.
So for example, something like are we in regular trading hours or is the regime mean reverting or is there news today or something like did this bar set a high compared to yesterday's high of day? One if yes, zero if no. Was this bar over two times the average true range over the last 14 periods?
One for yes, zero for no. So these are very often used for labeling events. So something like um an ICT trader might say something like uh was liquidity swept this bar or was this bar a fair value gap tap something like that that would be a a boolean or binary feature a yes or no.
Then we have ordinal features. So these are things uh that we create buckets for. So we might rank the volatility regime between one and five.
So really low volatility higher higher higher very high volatility. We might rank the trend strength bucketed from weak to strong or the day's news impact ranked from one to three something like this. So we want it to be discreet but we don't want it to be boolean because we want more than two options.
We'll use something like this. Okay let's zoom out and talk about features based on their purpose. So we have event defining features.
So for example like I was talking about before in my example for the boolean uh data type for a feature a breakout candle. Was this candle a breakout? Yes or no?
That's an event defining feature. earnings release or a new session high. And so the purpose of these is to define which moments are part of the actual experiment.
If we wanted to answer the question of does wicking below the previous day's low have any predictive value over forward returns for example, then we would obviously need to label the bars that wick below the previous day's low so that we know what bars are part of the actual experiment. So that's a feature technically. Then we have contextual features which we saw some of earlier like volatility regime, trend strength, time of day encoding.
The purpose of these is to try to explain why the same event might behave differently across different situations. So say we just have these features, the breakout candle and forward returns. And we were trying to establish a relationship.
We can't really do much with that. For example, say you think that when a level is passed through by a candle that closes at its high, so a candle with no wick on the top or a level is closed below by a candle with no wick on the bottom, maybe you think that breakout is more likely to be a true breakout or have some predictive power over forward returns. Then you would engineer that feature.
So you would engineer how big is the top or bottom wick, the wick on the breakout side relative to the body of the candle. Maybe they're meant to contextualize the events using whatever information you think is meaningful or could possibly explain why the event behaves differently in different circumstances. This is a lot of what retail traders tend to call discretion.
They'll see these features that they sort of engineer in their heads without ever actually quantifying. like for example maybe trend volatility wickediness of candles something like this and they'll formulate maybe like a daily bias or they'll just get a feeling a feeling that a trade is going to work or not work because of this pattern recognition that they have between features they've engineered in their heads qualitatively but never actually quantified and the outcome of the trade. So they'll get a feeling that this is a good trade or a feeling that this is a bad trade that I shouldn't take and they'll call it discretion.
And then actually a lot of people comment on my videos and say you can't program discretion. But in reality all your discretion is is just the amalg it's just a multi-dimensional pattern between features that you've engineered mentally but haven't actually quantified and the outcomes of your trades. So it pretty much everything is programmable.
My point being that contextual features are supposed to be anything that you think could explain why outcomes are different when the same event occurs. What's different and meaningful between different occurrences of the event and their outcomes? Then we have reference features.
So for example, prior session highs or lows, sixmonth low, all-time high. These are things that these are features that we engineer that probably are not going to be fed to the model or actually influence the way that we're taking our trades, but they're just there to derive other features in relation to them. But the reason we have these is because when I describe a feature like for example I think we were talking about continuous features and I said the distance to previous session low.
Well, you can't define that feature without already having a reference feature for the prior session low. Otherwise there's there's no low to compute the difference between. Then we have outcomes.
So we need to label our outcomes. What actually happened? Some examples of this could be a barrier based outcome.
So something like the event happens and then we define outcomes based on which one of two barriers similar to a stop-loss and a take-profit are hit first or a time barrier. Or we could label outcomes based on forward returns. So for example, when the event happens, we are looking at the next 10 bars of returns or the next 5 hours of returns or the next two weeks of returns, whatever it is.
Or similarly, we could look at forward volatility. The purpose of these is to define what happened after the event. And this is the thing that we're trying to predict.
Okay, let's move on. How to build a feature. At this point, we've talked about what a feature is.
We've given a realworld intuitive example of features using uh the example of predicting airline ticket prices and we've looked at a trading example as well. Then we've described how to categorize features by their value types and also by their purpose. We've also talked a little bit about the very misunderstood retail idea of discretion, which is really just mental feature engineering and multi-dimensional pattern recognition in disguise.
If you can transcribe the features that you consider mentally in your own discretion into actual quantitative features, then a computer can do that multi-dimensional pattern recognition better than your human brain can. So let's move on to how to actually build a feature. Let's go through this little logic tree that I put together.
Keeping in mind that the goal of feature engineering is to collapse raw data into the thing we actually care about in the right data type. So the first thing we have to ask ourselves is what do we care about? some examples, urgency, rejection, regime, distance, imbalance, etc.
What raw data contains traces of that thing? So, for example, say we're building an intraday breakout trading strategy. And the things we care about are rejection.
So, whether that level is going to be rejected or not, and we care about um momentum, maybe we care about market regime. So, are we in a choppy or a trending regime? I think that influences how these breakouts are are going to move.
Let's take that last example that I just said, the regime, whether we're trending or choppy, because I think that's something important and I want to engineer a feature or some features around it so that we can actually quantify the relationship between the market regime trending or or mean reverting, let's say, and the success rate of breakouts. We could look at something like the efficiency of the trend or the slope of some moving average to see how steep the trend is. Things like this, these are all features that are going to be derived from price.
We can calculate them directly just from that price data on our screen. We don't need any volume data, time data, economic data, news, etc. What kind of information should our feature be?
So, let's go with this same example that we're talking about um the market regime as it relates to trendiness. You don't necessarily have to get this right the first time. You can iterate, but you might go with continuous at first and look at, for example, the efficiency of the trend and see how that influences uh the success of breakouts.
Or like I mentioned before, you might be looking to classify the market into mean reverting or trending. So that might be a binary feature. Or if you wanted to engineer an ordinal feature, it could be -2 for strong downward trend, negative -1 for downwards trend, zero for mean reverting or choppy.
You understand the point. The idea is to think about what about the information actually matters. So do we just care whether the market can be classified as mean reverting or trending?
Is that the only thing we care about? or do we care about exactly how efficient the trend is? Or do we care about whether it's trending strongly, trending lightly, or chopping?
And we could well iterate through all three of those ideas and see which one has the most bearing over the outcomes of our breakout strategy. The next thing we have to think about is what transformation makes that concept explicit. So let's say at the previous step we chose to build a continuous feature and the feature we chose was the slope of the simple moving average.
When we talk about the slope of the moving average, the thing that I'm thinking about that might matter is just how upwards is it pointing, how downwards is it pointing, right? That would be the qualitative way probably to describe what you think matters about that feature. So, does the actual number which can be blown up with the scale of the price of the asset itself actually matter or do we just care really about how steeply it's pointing down?
How steeply it's pointing up? Probably the latter. So, we're going to want to normalize that feature.
The way I explained that was probably not very intuitive and sounded definitely more complicated than it actually is, but I'm kind of struggling to find an elegant way to put this. So, I'm just going to show you an example. Okay, I just made us this little indicator to almost doxed myself there.
Okay, so here's an indicator that we're looking at that shows the slope of a moving average. Let me actually also um pull up. Okay, so this indicator reports the slope of the moving average.
When the moving average is pointed very steeply upwards, this will be higher. When it's pointed downwards, this will be lower. When it's pointed directly sideways, it'll be at this midline at zero.
When we report the raw slope of the simple moving average, we're looking at how much it moved in dollars divided by the change in time. When the asset is very expensive, the slope is going to have wilder swings than when the asset is very cheap. So that means that the slope of this moving average if we report it raw is dependent also on the price of the asset.
But what if we zoom out and look as this same symbol gets cheaper further back in time? The swings are less large. the feature that we're thinking about is something that should not get much more erratic as the asset gets more expensive because that really has nothing to do with the idea that we were trying to capture in the future.
Right? So that's where normalization comes into play. So if we transform this to normalized or percentage based then we see throughout history as the price of the asset is really deflated or really inflated it's not changing how rangy this moving average slope is.
Now that we've normalized the feature to look at the moving average slope in terms of percentage instead of raw dollar values, the range of it becomes much more steady throughout time. And now it really captures the idea that we were trying to capture when we decided to engineer this feature, which is just how steep is the moving average, how sharp is the moving average. If for some reason that didn't click with you, just look up feature normalization for financial machine learning will probably pull up the best results.
Okay, let's think about something else like a ratio. Maybe I won't go in such great depth on this, but for example, say we want to identify if a breakout was violent or something. Maybe we'll say like a breakout is violent when it looks like this one where it's where we have this like range here and the average true range gets much lower and then boom it breaks out and the average true range all of a sudden ticks up a bunch, right?
We'll call that like an aggressive breakout. So say we wanted to engineer a feature that can determine like that that can rank the aggression of a breakout. So that would be something like uh to put it in more qualitative English terms, the range that we saw really recently is much greater than the range that we saw over a little bit of a larger window.
So the range that we saw right here is much bigger than the range average range that we were seeing back here over this longer window. So what we might look at is a ratio of two average true ranges. one over a short window and one over a longer window like this because just uh for example creating two features that are those two individual average true ranges doesn't uh doesn't effectively collapse the relationship that we're thinking about and that we're trying to capture in a feature.
So what would is if we just used a ratio. So a short average true range uh and a long average true range and seeing what the ratio of them is. So um that would tell us the the short window average true range is much larger than the long window average true range and we just uh broke through a level.
So that that's a feature that captures like this is a violent move that just happened. Differences um you can understand this one probably also. So something like uh the distance between price and a simple moving average like the one that we have on the screen right here we might think is a feature that indicates the uh beginning of a trend, an acceleration of a trend, something like that.
So that's an example of when we would use a difference and we'd probably also normalize this because we probably care about the percentage difference between the price and the average true range. Okay, so encoding is is a broader sect of transformation that describes when we do more than one of these things or these things in tandem with another thing. So I'll give you an example.
We just talked about that feature that is percentage distance from a simple moving average to price right now. Say I wanted to build a feature that describes how far price is from this moving average relative to normal. I would want to use something like a zcore.
So instead of trying to explain this one, I'm just going to build an example again. So let's just quickly look at this indicator. The idea that this captures now is how rare the normalized distance between price and this simple moving average is how rare it is.
So when the zcore is at three that means this distance this relative distance almost never happens. When the zcore is around zero it means this is very very normal. So the point is this feature that we've engineered captures the rarity of the normalized difference between the simple moving average and price.
That's an example of encoding. If that doesn't make sense, I'll do some explicit examples of different types of encoding in the next video. But I think that'll do for now and we'll keep moving.
Okay, looks like I duplicated what type of data should it be. We already talked about this up here. The features that we built briefly here and then visualized with indicators were contextual features.
So maybe for this Python example, we'll do an event defining a feature. Something like a sweep of a previous session high. I know people love to talk about sweeps of highs.
So maybe like a a wick above a high or something like that. And then maybe while we're at it, we'll also do um an outcome feature. So something like uh we'll maybe use a double barrier feature to define the outcomes of our event that we build.
Okay. So, I'm going to use quantpad to build this feature. Okay.
Previous day low and previous day high sweeps. Let me write this prompt. Okay, I just realized that I didn't specify the symbol or the date range that I wanted to look at, but it shows ES and 2023.
All of 2023. Okay, so I've just run this. Let's look for the plots.
Okay, so I think visualizing and validating your features, making sure that they are exactly what you want them to be is extremely important in feature engineering. My point being always visualize some examples of your features. So here's an example of a sweep of the previous day's low.
So we wick below, come back up, and this was labeled as plus one. So that was a sweep of a low. Here is one that was labeled as minus one.
So this is a sweep of a high. We wicked above uh came back down and closed and then we wound up breaking through and it was not a reversal. The other one, wow, the other one actually was totally a reversal.
That's pretty coincidental. Okay, now before we come up and look at the statistics of the barrier labeled outcomes of these events, I also meant to ask also for the barriers to be plotted. It would be great to see the barriers that we used to label the outcome of this event.
Okay, so now for both events, we have the double barriers plotted. This one hit the top barrier. This one hit the top barrier as well.
And these are average true range based. And we could tune the average true range multiple, but I'm not going to do that. Now, now that we've verified this behavior, let's go and look at the statistics up here.
We actually have a lesser number of events than I figured we would get in one full year, but we'll still go through the statistics and then we can run it once more with maybe 5 years of data. Okay, so the expected value of each case tells us the average return of each case. So in the bearish case, it is negative, price goes down.
In the bullish case, it is positive, price goes up. But that's what we observed just in this experiment which could have luck involved. So we use confidence intervals to better answer the question, how likely is it that the true expected value of each of these cases falls within a given range?
For the bullish case, there's a 95% chance the true expected value lies within this range which contains a negative number. So we're forced to call the bullish case statistically indistinguishable from zero, whereas the bearish case is all negative. So this is a statistically significant negative expected value.
Now like I said, this sample size is modest. So, let's go back up here and adjust the date range. Let's go all the way from 2023 to 2025 or we'll do maybe from 2022 to 2025.
In this larger sample, we have 512 events. So, we are still looking h that's interesting. Over this larger window of time and this different sample, the 95% confidence interval of the expected value of the bearish case is no longer negative at both bounds.
So we know the expected value was statistically negative in the year 2023 that we tested initially but over this larger window we can't say the same. So we have one event that seems to lead to different outcomes over different time periods that we test. So how do we explain the variability and outcomes of the same event?
That's exactly what contextual features are for. And this is starting to get at the bigger picture. We have a well- definfined event.
We have a well- definfined outcome. And we have a systematic and quantitative way of measuring how that event affects the outcome. Now we have to see what happens when we condition on other things to contextualize those trades.
What else might matter in predicting the outcome of this event? Is it the previous day's range? Is it the size of the wick in comparison to the body that wicks through?
Is it any number of other things? And then we can start to build features for all of those to contextualize the events. We can use statistical analysis and machine learning to uncover these multi-dimensional and nonlinear relationships that might exist.
different contextual features which together with each other might help predict outcomes of this event better. As we engineer more meaningful features, the goal is to paint a more vivid picture about what actually matters in predicting these outcomes. And this is what begins the real process of developing a quantitative trading strategy.