Once upon a time, not so long ago, you could walk into a betting shop and get a price on any football team you wanted, as long as you wanted to back that team to win. You couldn’t possibly back a team to lose, no sirree, no way, that would be … immoral, and wanting to do so would likely bring a plague down upon your house, involving frogs, locusts and all manner of other unpleasantries.
Or at least so said Hills and Ladbrokes.
Thankfully the High Court was persuaded that in a two horse race, laying team A was functionally equivalent to backing team B, and the big bookmakers were sent away with their tails between their legs; although they kept coming back for about ten years, so you can’t fault them for persistence. Meanwhile Betfair emerged, grew like a weed, and enabled punters to back and lay to their hearts content, until .. well, liquidity dried up, they hired a bookmaker as CEO and decided that being a sportsbook was in fact the way to go after all.
But that’s a sad story for another day. By now, the laying genie was well and truly out of the bottle, and there was to be no stuffing him back in. And whilst Betfair popularised the concept, it’s often forgotten that the idea was pioneered by the spread betting firms, before the exchange was even a gleam in Andrew Black’s eye.
Yes, time was when every bookmaker wanted to be a spread better. Ladbrokes Index, William Hill Index, City Index, IG Spread, Spreadfair .. all great firms (ahem), all now gone to the great bookie in the sky at the hands of Betfair’s reaping scythe. The exchange’s genius was to allow laying on fixed odds product, which was well understood by punters, rather than inventing a new class of product which required a credit account.
And so now the spread betting industry is in a pretty sorry state, with only two firms (Sporting Index and Spreadex) remaining. They do however make some interesting markets, at least from a modelling perspective; stuff which isn’t available elsewhere and which, as we’ll see, is rich in informational content.
Take Season Points – Premier League Season points are available here. The interesting thing about Season Points is that you can get a meaningful, equivalent price for every team in a league. £100 per point on Man City is, risk- wise, precisely equivalent to £100 bet on Liverpool. Where else can you place these kinds of bet ? Not in the Winner market. £100 at 2/1 on Man City to win is risk- wise a very different proposition from £100 on Liverpool to win at 10/1; and good luck getting a meaningful price on Crystal Palace.
Given the fact that the Season Points market is ‘complete’ (ie equivalent prices for every team), we can do some interesting stuff with just a small amount of code and a few distributional assumptions.
Let’s get the data from Sporting Index, who are kind enough to make it available via an (undocumented, natch) JSON feed; although you have to grab the name/id mapping from the web page in a separate call to make sense of it.
import lxml.html, json, re, urllib
LivePricingUrl="http://livepricing.sportingindex.com/LivePricing.svc/jsonp/GetLivePricesByMeeting?meetingKey="
def get_market_quotes(url):
doc=lxml.html.fromstring(urllib.urlopen(url).read())
ids=dict([(li.attrib["key"], re.sub(" Points$", "", li.xpath("span[@class='markets']")[0].text))
for li in doc.xpath("//ul[@class='prices']/li")
if "key" in li.attrib])
quotes=json.loads(urllib.urlopen(LivePricingUrl+url.split("/")[-2]).read())
return [{"name": ids[quote["Key"]],
"so_far": tuple([int(tok) for tok in quote["SoFar"].split("/")]),
"bid": quote["Sell"],
"offer": quote["Buy"]}
for quote in quotes["Markets"]]
Let’s see what the data looks like:
MarketQuotes=get_market_quotes("http://www.sportingindex.com/spread-betting/football-domestic/premier-league/mm4.uk.meeting.4191659/premier-league-points-2013-2014")
print pd.DataFrame(sorted(MarketQuotes, key=lambda row: -(row["bid"]+row["offer"])/2.0), columns=["name", "bid", "offer", "so_far"])
name bid offer so_far
0 Man City 78.0 79.5 (25, 13)
1 Arsenal 77.5 79.0 (31, 13)
2 Chelsea 76.5 78.0 (27, 13)
3 Man Utd 72.0 73.5 (22, 13)
4 Liverpool 67.5 69.0 (24, 13)
5 Tottenham 64.5 66.0 (21, 13)
6 Everton 61.0 62.5 (24, 13)
7 Southampton 55.5 57.0 (22, 13)
8 Newcastle 52.0 53.5 (23, 13)
9 Swansea 46.0 47.5 (15, 13)
10 Aston Villa 45.0 46.5 (16, 13)
11 West Brom 44.5 46.0 (15, 13)
12 Stoke 39.5 41.0 (13, 13)
13 Hull 39.5 41.0 (17, 13)
14 West Ham 39.0 40.5 (13, 14)
15 Norwich 38.0 39.5 (14, 13)
16 Cardiff 37.5 39.0 (13, 13)
17 Fulham 34.0 35.5 (10, 13)
18 Sunderland 33.0 34.5 (8, 13)
19 Crystal Palace 27.5 29.0 (10, 14)
Okay, looks pretty sensible. What we want to do now is to set up a simulation process for season points. The mid- market prices from the table above represent expectations of how many points each team is likely to get. But of course that’s just an expectation; it’s entirely possible for each team to get more points than the offer price, or less points than the bid price; for each team there’s a distribution of points around this mean expectation.
But what do these points distributions look like ?
Well, they are certainly bounded; it’s not possible for a team to get less than zero points, nor more than 114 (in the case of the Premier League; 3 points * 38 games). In fact the distributions are bounded more narrowly than this; given we are half way through a season, a team can’t get less than their current number of points, nor is it possible for any team to get precisely 114 since every team has now lost at least one point through a loss or a draw.
We don’t want to make the assumption that season points are normally distributed however; it’s better to set up a function to simulate the distribution, and then look at the resulting shape.
def simulate_points(quotes, paths, draw_prob=0.3):
import random
simpoints=dict([(quote["name"],
[quote["so_far"][0]
for i in range(paths)])
for quote in quotes])
ngames=2*(len(quotes)-1)
for quote in quotes:
midprice=(quote["bid"]+quote["offer"])/float(2)
currentpoints, played = quote["so_far"]
toplay=ngames-played
expectedpoints=(midprice-currentpoints)/float(toplay)
winprob=(expectedpoints-draw_prob)/float(3)
for i in range(paths):
for j in range(toplay):
q=random.random()
if q < winprob:
simpoints[quote["name"]][i]+=3
elif q < winprob+draw_prob:
simpoints[quote["name"]][i]+=1
return [{"name": key,
"simulated_points": value}
for key, value in simpoints.items()]
Now there’s a lot to quibble about with this function. Specifically, it takes no account of the remaining fixtures and blindly assumes that all future games are played against teams of equal quality. I’ll leave you to think about how to remedy this as homework. In the meantime however it serves as a useful tool with which to explore the distributions; let’s simulate and look at the first and second moments:
SimulatedPoints=simulate_points(MarketQuotes, paths=50000)
MidPrices=dict([(quote["name"], (quote["bid"]+quote["offer"])/float(2)) for quote in MarketQuotes])
for row in SimulatedPoints:
row["mid"]=MidPrices[row["name"]]
row["mean"]=np.mean(row["simulated_points"])
row["stdev"]=np.std(row["simulated_points"])
row["error"]=row["mid"]-row["mean"]
print pd.DataFrame(sorted(SimulatedPoints, key=lambda row: -row["mid"]), columns=["name", "mid", "mean", "stdev", "error"])
name mid mean stdev error
0 Man City 78.75 78.71340 5.542557 0.03660
1 Arsenal 78.25 78.31092 6.133936 -0.06092
2 Chelsea 77.25 77.19956 5.903095 0.05044
3 Man Utd 72.75 72.77466 5.822711 -0.02466
4 Liverpool 68.25 68.25186 6.274254 -0.00186
5 Tottenham 65.25 65.20336 6.292056 0.04664
6 Everton 61.75 61.76346 6.434456 -0.01346
7 Southampton 56.25 56.25740 6.401914 -0.00740
8 Newcastle 52.75 52.78540 6.233348 -0.03540
9 Swansea 46.75 46.82266 6.358935 -0.07266
10 Aston Villa 45.75 45.76336 6.241635 -0.01336
11 West Brom 45.25 45.28530 6.292197 -0.03530
12 Hull 40.25 40.25510 5.770000 -0.00510
13 Stoke 40.25 40.29252 6.104398 -0.04252
14 West Ham 39.75 39.69928 5.977452 0.05072
15 Norwich 38.75 38.78156 5.877997 -0.03156
16 Cardiff 38.25 38.21546 5.909287 0.03454
17 Fulham 34.75 34.70772 5.890475 0.04228
18 Sunderland 33.75 33.76118 5.974816 -0.01118
19 Crystal Palace 28.25 28.22622 5.119481 0.02378
The error column is the difference between the mid- market quote and the mean of the simulated distribution. You’ll notice that all the errors are reasonably small, and could be improved by increasing the number of paths. What’s important here is that the simulation mean is converging on the market mean, which means (no pun intended) that our simulation is consistent with market prices.
We’ve also generated the standard deviation of each distribution. What’s interesting here is that the numbers for each team are reasonably similar (around the 5, 6 mark) but also that the numbers are markedly higher in the middle of the pack. If you think about how points for win/draw/loss are distributed, this makes sense; a weak team like Crystal Palace will generally be picking up zeroes and ones; a mid- ranking team like Swansea will be picking up zeros, ones and threes; whilst a strong team like Man City will typically be picking up threes and ones only.
Of these three categories, the smallest variance is for Crystal Palace (0, 1) whilst the largest is for Swansea (0, 1, 3); Man City are somewhere in the middle (1, 3)
Now points distributions are interesting, but they are not the real prize. There are very few contracts out there which are direct functions of season points; in fact other than Season Points themselves, I can’t think of any. There are however very many contracts which are functions of finishing positions – think Winner, Promotion, Top 6, Relegation etc – and finishing positions are direct functions of the number of season points a team achieves.
So what we need is a function to convert our season points distributions to finishing position probabilities:
def calc_position_probabilities(simpoints):
paths=len(simpoints[0]["simulated_points"])
positionprob=dict([(team["name"],
[0 for i in range(len(simpoints))])
for team in simpoints])
for i in range(paths):
sortedpoints=sorted([(team["name"], team["simulated_points"][i])
for team in simpoints],
key=lambda x: -x[-1])
for j in range(len(simpoints)):
name=sortedpoints[j][0]
positionprob[name][j]+=1/float(paths)
return [{"name": key,
"position_probabilities": value}
for key, value in positionprob.items()]
All this function does is loop over each simulation path, rank teams according to the number of points scored, and then create a histogram of the rankings for each team; this histogram is equivalent to a vector of finishing position probabilities.
Finally we need a heatmap generation function:
# http://stackoverflow.com/questions/14391959/heatmap-in-matplotlib-with-pcolor
def generate_heatmap(data, size, colourmap, alpha=1.0):
sorted_data=sorted(data, key=lambda row: np.inner(np.arange(len(data)), row["position_probabilities"]))
df=pd.DataFrame([row["position_probabilities"] for row in sorted_data],
index=[row["name"] for row in sorted_data],
columns=np.arange(1, len(sorted_data)+1))
fig, ax = plt.subplots()
heatmap=ax.pcolor(df, cmap=colourmap, alpha=alpha)
fig=plt.gcf()
fig.set_size_inches(*size)
ax.set_frame_on(False)
ax.set_yticks(np.arange(df.shape[0])+0.5, minor=False)
ax.set_xticks(np.arange(df.shape[1])+0.5, minor=False)
ax.invert_yaxis()
ax.xaxis.tick_top()
ax.set_xticklabels(df.columns, minor=False)
ax.set_yticklabels(df.index, minor=False)
# plt.xticks(rotation=90)
ax.grid(False)
ax=plt.gca()
for t in ax.xaxis.get_major_ticks():
t.tick1On=False
t.tick2On=False
for t in ax.yaxis.get_major_ticks():
t.tick1On=False
t.tick2On=False
And we’re ready to go:
PositionProbabilities=calc_position_probabilities(SimulatedPoints)
generate_heatmap(PositionProbabilities, size=(6, 6), colourmap=plt.cm.Reds)
Image may be NSFW.
Clik here to view.
Et voila, a heatmap of finishing position probabilities for the Premier League.
Couple of things stand out – how Crystal Palace are rooted to the bottom, how the Top Six are rapidly splitting into a Top Three group (Man City, Arsenal, Chelsea) plus the rest, and how there’s a lot more uncertainty regarding finishing positions in the bottom half of the table (less intense colours) than there is in the top, where each team seems to be trading in a three- position range.
You could take this analysis a lot further – one obvious thing to do would be to price Winner, Top 6, Relegation bets as functions of finishing position probabilities; I’ll leave that one as an exercise. My main point is simply to demonstrate that there’s a lot more information embedded in some market prices than might initially meet the eye, and that it’s generally possible to extract it with a small amount of code, some distributional assumptions and a little imagination.
The post Simulating Finishing Positions appeared first on Sports Trading Network.