Some people, when confronted with a problem, think “I know, I’ll use regular expressions”.
Some people, when confronted with a graph problem, think "I know, I'll use PageRank".
Being a member of both groups, I was wondering about all the data analyses okcupid was running - and how can we actually measure dating potential correctly? Using chance of getting a reply to your messages like okcupid's analyses is strongly biased towards Internet trolls who are really good at getting replies - but that's not concern of this post. Let's say we actually have a good measure of how attracted one person is to another (and for simplicity assume it's between 0 and 1, values in between being interpreted as either levels or probabilities).
The problem is that even such simple kind of attractiveness requires a graph analysis - counting how many people are attracted to someone is not terribly useful as it ignores some distinctions that proper graph analysis would catch:
- how attractive are people being attracted to you? (by recursive graph analysis)
- to how many other people are they attracted in addition to you (or do they simply have low standards?)
- is it mutual or are you attracted to wrong kind of people?
What I'm going to do is to turn every person into a website, attractiveness into linking, and compute PageRank on the entire dating market. Scenarios presented here are hypothetical, as right now I'm more interested in figuring out proper analysis tools than in running any real analysis before I know if I'm measuring a sensible thing or not.
By the way, these graph measures works surprisingly well in presence of an arbitrary mix of hetero-, homo-, and bisexuals in the network, and it's fine if some people are homophobes or ... let's call them "biphiles" - guys who are really into lesbians (think threesomes) and girls who are really into gays - and in either case typically preferring a bisexual opposite gender partner to a heterosexual one - we can just note that in their attractiveness links.
The most naive thing to measure would be PageRank of "link to someone if you're attracted to them".
Scenario 1. Let's populate the world with 10 ensembles of girls and 10 ensembles of guys separated by which decile of physical attractiveness they fall into. People are only attracted to those at least as attractive as they are.
The results are quite drastic. Because the 10s are desired by everyone, yet extremely picky, they get really high scores - AttractiveRank by decile is 0.110 0.122 0.138 0.158 0.186 0.227 0.292 0.418 0.759 7.591 - and that's only because the model includes a 10% leak - traditionally meant to represent "user getting bored and typing a new URL" - here maybe "someone getting really really drunk at some party".
Let's modify it a bit and create Scenario 2 - with people attracted to those of just one decile less too. So 7s are attracted to everyone from 6 to 10 and so on.
This is great news for 9s, and pretty good news for everyone except 10s - now the ranks are 0.124 0.140 0.162 0.194 0.253 0.394 0.780 1.715 3.119 3.119. All that thanks to just slight lowering of everyone's standards.
There's a serious problem with this anyway. In scenario 1, 10s could have any 3s or 8s if they wanted - but they very much don't. So everybody is actually is a very similar situation - 3s have as many other 3s to date, as 10s have 10s.
Let's introduce ReciprocityRank - which is just like regular AttractiveRank except non-mutual links are dropped. So it's you're a 7 and you wouldn't actually mind that 9 over there if you thought you had any chances, but as a good graph analyst you know you don't - you forget about that 9 in no time.
For scenario 1 ReciprocityRank correctly identifies that everyone has the same rank 1. Scenario 2 is more interesting - with ranks of 0.781 1.098 1.058 1.036 1.027 1.027 1.036 1.058 1.098 0.781. First, it is strangely symmetrical - it sucks as much to be a 10s as it does to be a 1s - in one case because your standards are too high, in other because everyone else's standards aren't low enough - but the effect is the same. What might be more surprising is how 2s and 9s are in the best position - not only can they date other 2s and 3s / 8s and 9s respectively - they have access to large number of fairly desperate 1s or 10s. Yes, 10s' desperation comes from unreasonable standards, but it doesn't matter (mathematically that is).
That's probably not what we're looking for. You'd think that these 10s are in better situation - they might not have much choice but it's all quality choice. And if you're a really desperate 10 you can always change your mind about one of those 8s that stalks you. A 1 doesn't have such luxury. So let's count unreturned links, but only for 1/5 as much as returned ones, sort of like ref="nofollow" about which search engines changed their mind so many times.
Scenario 1 results in DateRank by decile of 0.147 0.167 0.194 0.229 0.281 0.359 0.493 0.759 1.474 5.897 - quite reasonable. Scenario 2 results in 0.210 0.282 0.336 0.424 0.569 0.808 1.194 1.755 2.408 2.013 - 9s are strangely still better than 10s - but notice that everyone is equally attracted to both, it's only 9s' lower standards which increase their dating potential.
Now that our links have weight, let's try scenario 3 - people are attracted to those less attractive than them only 50% as much as to those equally or more attractive than they are. Surprisingly it only helped 9s and now DateRank is 0.188 0.235 0.278 0.349 0.469 0.686 1.080 1.730 2.558 2.426.
Height and education
Now that we have a sensibly defined PageRank - let's try something closer to reality. Let's say guys and girls come in three tertiles by height - short, medium, and tall; and in three independent tertiles by intelligence and education - let's say dumb, average, and smart. Of course they're not directly comparable across gender lines - what counts as tall for a woman might count as medium for a man.
So the rules of scenario 4 are:
- Women don't like men shorter than them
- Women don't like men dumber than them
- Men don't like women taller than them
- Men don't like women either smarter on dumber than them
It's all loosely based on this paper, I'm not making this up. Could you guess who has it easier and who has it harder in such situation?
Here's table for guys (rows by height, columns by intelligence):
0.240 0.401 0.985
0.380 0.703 1.873
0.550 1.047 2.821
And here's one for girls:
0.652 1.091 2.676
0.436 0.730 1.789
0.265 0.414 0.946
So how good was the analysis? Let's look at girls' table first. It's best for the girl to be short and smart. We could have predicted the shortness - guys not liking taller girls was explicitly in the rules - but why would girls want to be smart if equally many guys will be attracted to her regardless? This is a good example of why graph analysis matters. If a girl is dumb, plenty of men will be attracted to her, but these will all be dumb men shunned by most. If a girl is smart, she gets exactly as many men to choose from, but these are some quality men.
And unsurprisingly it pays for a man to be either taller or smarter - but intelligence is a far greater factor than height. How did that happen? And why does the same hold for girls? (except with reverse order of height) In this model men are more selective on intelligence than on height - and only same-intelligence combinations are possible (3 combinations out of 9 are ok), while plenty of mismatched height combinations are possible (6 combinations out of 9 are ok). You decide how meaningful this is.
A couple months ago OkTrends wrote an interesting post about men not being terribly attracted to older women. Now I'm not going to use their data because I'm too lazy to convert pictures into numbers, and it's rather noisy, so the rules of scenario 5 are:
- There are 31 ensembles of men and women, for each age between 18 and 48.
- Women are attracted to men no older than their age + 6, and no younger than their age - 4; and half as attracted to men between their age - 8 and their age + 10.
- Men are attracted to women at most 4 years older than them, and this is a hard limit, no exceptions.
- Men are most attracted to women older than their minimum target - which grows linearly from 18 at 23 to 32 at age 48; however they're still open to younger girls (0.5x).
And these are the result - unsurprisingly men peak later than women (around 28, not 22), and while they're not as hot at their peak as women, they don't crash as hard as women either.
There are some limitations of this analysis - first, because okcupid's data used hard cutoff of 18-year, it discounts the possibility that a given 18 year old might wish to date someone who's 16 or 17 - their actual potential is higher than indicated - and likewise for older people. By the way it is blatantly obvious in okcupid's graphs that there are plenty of early-30s women who lie and pretend they're exactly 29. And if you've ever been there you'll know that there are quite a few under-18s who pretend to be 18. So take their data with a pinch of salt.
Another problem is assumption that all ensembles are equally sized - in reality while there are about as many 48 old women alive as there are 22 year old, there are far fewer of them on the dating market. Let's run scenario 6 - with ensembles of under-30s are 5x as numerous as ensembles of over-40s, with sizes decreasing linearly during the 30s.
The peaks stayed where they were - but the fall with age is far more drastic, especially for women.
Offensive misogynist adviceGuardian-reading liberals are asked to skip this section.
To the extent that these models are valid - the advice they give to men seeking long term relationships would be to go after teenage girls before their peak; while men looking for a quick hookup would do better going after women in their late 20s or early 30s - when they are still hot but already getting desperate. And in either case go after taller girls to avoid competition.
Model's advice to women is - get pregnant while you can. Unless you're still a teen, your dating value is only going down, and quality of men you will be able to attract for a serious relationship is getting worse each year. And remember - your biological clock is ticking, and setting up a family takes years (accidents excepted). You might think that you're still hot now, but what if after 2 years with someone you decide he's not the right guy, then you spend 2 more years with someone else, then you decide to perhaps maybe try getting a baby - nowhere near as quick and easy as it seems in your 30s - all while your younger friends look more and more tempting to your man; while your eggs are less and less cooperative. The legal system is what it is - in most countries getting married and having a few babies sets up woman for life, so even if it doesn't seem terribly appealing now, you need to decide thinking what will be appealing in a decade.
That, or freeze your eggs and start saving for regular botox. Works either way.
Don't blame me - it's the reality which is misogynist.
Code and mathematics
Code can be downloaded here.
One problem with this approach is existence of two completely arbitrary numbers - booze factor (0.1) and stalker factor (0.2) - they are not derived from anything meaningful and changing them can completely change the results. In fact the booze factor gives a hard floor to how low can anyone's attractiveness fall, and three values of stalker factor (0.0, 1.0, 0.2) result in three completely different ranks discussed here.
Another problem is assumption of constant ratio of mutual to unreturned attraction. We might as well assign constant percentage of links to mutual attraction and the rest to all links bundled together. It matters as people with low standards would focus on attraction returned instead of wasting their link power / time on stalking. It would be a fairly straightforward modification, and I don't know which version would work better.
And remember - all models are wrong, some are useful.