After a little over 15 months, “These are some mountains, and this is a lake” has finally accumulated 2500 views. Somehow.
Honestly, I’m not sure how it happened. For a popular blog, I’m sure this would be abnormally low views per time, but all I do is complain about neckbeards and pop science with the occasional picture of poops or food or both. I’ve linked to it a few times on Facebook, but I highly doubt all the traffic is repeat visits from assholes I’m “friends” with. So it really begs the question: who is reading this blog?
TIME FOR SOME STATS!
Ok, so really, who is looking at my blog? Luckily WordPress keeps some decent stats for me, with a country-by-country breakdown. You can probably guess which country is at the top of the list…
Yes, folks, the good ol’ United States of Apathy is on top yet again, with 1462 views. Considering I am an American, and this blog is written in American English, this isn’t surprising at all. The next three highest views shouldn’t be surprising either.
Australia, the United Kingdom, and Canada, the three other major English-speaking countries of the world, take the next three slots. Australia has 243 views, the UK has 193, and Canada has 142. Funnily enough, while looking at these, I discovered Australia owns an island far to the south of New Zealand, called Macquerie Island.
Macquerie Island, the land that most of us never knew Australia owned. Or existed. You can tell I’m not lying by the beautiful Google watermark.
Apparently it’s part of Tasmania, and home to the world’s royal penguin population while they are nesting…
as well as this person who has the worst taste in pants. But I digress.
After the big four come a parade of mostly western European countries with languages we took in high school.
Germany – 91 Views
France – 62 Views
Sweden – 37 Views
Norway – 21 Views
Ireland – 21 Views
Poland – 20 Views
Unfortunately they didn’t offer Irish at my high school. They did offer Irish coffee, though. It wasn’t like an official thing, just something the lunch ladies did every other Thursday when they got their paychecks…
Moving right along, the views start to taper off around the rest of the countries who have found my dear blog. The honorable mentions between 10 and 20 are:
Netherlands – 18 Views
Romania – 16 Views
Malaysia – 13 Views
Finland – 13 Views
Spain – 12 Views
Russian Federation – 11 Views
Mexico – 11 Views
Estonia – 10 Views
Brazil – 10 Views
I’m trying my best right now to not be upset over the fact that there have only been 11 views from the entire Russian Federation in the history of my blog. I blame Putin. For pootin’ on my stats.
Small southern Pacific Ocean countries have the next three slots:
New Zealand – 8 Views. Obviously not “zeal”ous about my content.
Singapore – 6 Views.
Philippines – 5 Views
Also tied for five views a-piece are:
Wait, seriously? Italy has only 5 views!?
At 4 views, we have 4 countries. That’s… pleasantly congruent.
At three views, we have 4 similarly-sized countries as the ones previous, including South Africa, the first entrant from the entire continent of Africa.
At two views, the list gets quite a bit longer…
Chile. Cheee-ley. Not pronounced like “chilly”, you uneducated slags.
India, a country of call centers and possible viewers like you.
United Arab Emirates
And now, the one-timers:
Ukraine. At least someone has a free moment away from political unrest and turmoil to feed my ego.
Turkey. Rhymes with jerky. Kind of looks like jerky, too.
Republic of Korea
Now if you’ve been counting, (and if you actually have, why?!) you’ll notice that the total adds up to 2502. This is because I happened to not be sitting here at the exact moment the counter rolled over. And there is a distinct possibility that Fiji really doesn’t belong in the first 2500 stats. But honestly, I don’t give a shit. So suck it.
Here’s the list of collected stats:
So how do I evaluate how much a country really likes me? Some countries, like Latvia or Uruguay, have only 1 view, but they also have different populations, economic structures, LANGUAGES… it would be a statistician’s NIGHTMARE to have to figure it all out. But I am not one of those, so we’ll do the quick and dirty evaluation.
First off, I want to normalize the population distribution; that is, I will create a ratio of views per person for each country, and then re-list them all. That way, smaller countries who may have less views will be weighted the same as larger counties, population-wise.
For example, the US has 313.9 million people as of the 2012 census. So if I divide 1462 views by 313.9 million, I get 0.0000047531. Shit, that’s small. In scientific notation, that’s 4.7531 x 10^-6. Since I am an American, and America is number 1 (it’s probably not, but work with me here), I’ll normalize by this value. Thus, my views-to-population ratio is now equal to 1, which we can idealize a little easier. Let’s do the next country, Australia. The population of the land down-under is 22.68 million, and the views are 243, which means a ratio of 0.00001071428. That’s 10.714 x 10^-6, or, in our normalized terms, 2.25. What this means is that if the entire country of the United States of Alfalfa, babies and all, were to view my blog exactly once out of the interest and curiosity of their heart, each person in Australia would be viewing it 2.25 times. Or, in extremely simple terms, this blog is exactly 2.25 times more popular in Australia than in the US. Dang.
When you crunch all of the numbers (populations obtained from good ol’ Wikipedia ), you get something that looks like this:
Click to enlarge, or, you know, ruin your eyes trying to squint at it all tiny’d up.
The first thing that jumps out (or should jump out at you, unless you are not the kind of person to be having jumps at your abode) is that after Australia, the top ranked country, the next two countries are neither the United States of Antidisestablishmentarianism nor countries that speak English as a first language. Iceland and Estonia, even though not close to the top in views, have a higher population percent reading my blog than my own country! Damn, I’m good!
After that, it looks like you would expect: English and western countries, then eastern European, South American, Asian, and onward. But you really shouldn’t trust this graph, even though it’s normalized and I made it so it’s like, the bomb. This is for several reasons, the first and foremost being that population alone is not a strong signifier of interest. I have excluded things like language, culture, ideology, and especially economy, which directly affects the ability of people to access the internet and how much they are able to do so in a given period of time. Having a single view in a poorer country with a culture and language not like my own is a greater accomplishment than tons of views in my own country. So when I got a view in Fiji, it was a far more exciting moment than getting 20 views from the United States of Albuquerque, New Mexico.
The second major reason you should not trust this is that the majority of views are below 10. I don’t always get views from every one of these countries, and a lot of the time it’s an isolated event. In the extreme case, having 1 view in a country for the first time and then never having another one can mean a lot of things EXCEPT statistical significance. If I have several hundred views, then we can talk about what it really means, but 5 or 6 doesn’t mean anything quite yet, other than a person maybe stumbling onto my blog, or even a person from a high-view country visiting another and viewing my blog from there. I’m not told how a person reaches my blog per country, so a each view I get is loaded with uncertainty.
The third major reason may have been something you initially noticed on the graph above; a lot of countries are indistinguishable from the x-axis of value 0. If I expanded my y-scale, you would start to see that they aren’t 0, but then values for Australia, Iceland, and Estonia would start to “fall off” the graph. That is, you wouldn’t be able to see the forest for the trees, and what good are these stats, then? This characteristic of data to appear 0 when everything is weighted and scaled is called “insignificance”. I mentioned statistical significance a little earlier, which is the opposite: values have true meaning. When values are insignificant, they don’t mean anything, or at least you can’t make assumptions about it. You’ve probably encountered this in a chemistry or physics class, where some tired old professor is droning on about significant figures this and his sciatica that. But “sig figs” really are very important in sampling and data analysis. If you have a final value that is 10 digits in length, but you obtained it with values only 3 or 4 digits of significance, you can’t be sure about the last 6 or 7 digits of your result. You have to trash those numbers. This happens ALL THE TIME in experimental and analytical fields like physics, biology, economics, and gourmet chef-ery. You have a value that, when graphed, looks like it’s zero, it might as well be.
Statistical significance and insignificance may sound brutal; isn’t every sample important, especially the abnormal ones? It actually is a very useful tool for data analysis! If you measure something 100 times, and one of those measurements happens to outrageously high, then you can toss it, because you can’t be sure if it is actually a real measurement.
Another thought experiment: in a room of a 100 monkeys sitting at computers, 10 start writing Shakespeare. That’s significant. You can now start writing grants to fund your very own monkey-staffed literature factory. Ok, now same situation, but only 1 or 2 start writing Shakespeare. While wildly exciting and hilarious, your time is better spent clipping your toenails. Anything can be happening to make 1 or 2 monkeys out of 100 write Shakespeare. It could be a truly genius monkey, but none of the others are that smart, so it’s an anomaly. It could be monkey see, monkey do; maybe a monkey actually saw some Shakespeare written down, and decided to copy it. And then there is the very simple solution of some poor underpaid technician pranking you, the scientist, because they only went as far to get a bachelors degree while you have a doctorate and this is their way of “stickin’ it to the man” (as you can see, I tend to not think very highly of people who stop at a bachelors because they’re “tired of school”, because that is a bullshit answer. Everyone is tired of school. That’s why it’s called school and not “fun times with your friends”. Pick a new reason, techs!). In short, having things be graded as significant or insignificant is an unbiased and absolutely great and easy way to sieve data.
Ultimately, the only countries I can consider and start to make assumptions about are the ones with a lot of views, similarity to my own country, and a significant population-weighted rating (see graph. see graph run. run, graph, run!). This lowers my sample space of countries down from 54 to a handful at most. But that’s statistics, which is why you should never ever trust a number that hasn’t been beaten to death.
Regardless, I might as well show the countries my blog has touched, even though there are some statistical issues…
The world as we know it. But it’s not the end, so I feel fine.
So now that this world map is up, let’s talk about something else. Who hasn’t been reading my blog?!?
If you’re like me, and you probably are, you noticed during the discussion of views that China is oddly missing. This is something that is made even more mysterious by the fact that Indonesia, Mongolia, and a large part of Eurasia is also missing. Now, admittedly, I just spoke about statistical significance, and how a lot of countries who viewed this blog can’t really be considered viewers. If you add those back in, you can consider the large swath of South Asia as largely ignorant of the fact that I am awesome. It’s a damn shame.
Another thing to notice is that Africa, apart from Egypt (which is often considered part of the Middle East community) and South Africa (which is like Australia/America Lite, if Australia and America were beers), has not clocked any views. We can all try and reason that it’s due to Africa being populated by a bunch of ignorant bush bitches, but that is an oversimplified and prejudiced view of an entire continent (minus Egypt and South Africa). Countries in Africa HAVE an internet connection, people!
The Middle East, minus a couple of countries, and the Carribean and Central America areas round out the list of where they just don’t check out my rad lines. While some of these places can be logically explained (the Sahara does not have tech support) others really should be up there on the stats list. Out of the 7+ billion people on this planet, China accounts for 1.35 billion of them… and not a single view.
The lack of data can, in this case, also be data. If a country should be on my view list based on qualifiers like having the English language as a common knowledge, good economy/high tech availability, and large population, and then it isn’t, that would count as being data. So there is something that is directly restricting China from viewing my blog, which it should, statistically-speaking, be viewing. Now, the reasons for that are varied I’m sure, but it does beg the question for a lot of countries: why aren’t they viewing my blog?
Unfortunately, it’s not a question I can answer with any certainty (and I have been known to answer many, many questions with extreme certainty) so it has to be left as it is. There are too many unknowns for the equations we have. So, in that case, we’ll move right along and ask the next question: what are people viewing?
Again, WordPress keeps stats on what pages are the most popular and which are the duds. As a disclaimer, these are the most recent views at the time of writing, so the numbers won’t add up to be 2500. Console yourselves.
Ok, now this should really boggle the mind. The home page, the ANCHOR of this blog, is third to the Neckbeard Spotlight on trenchcoats, followed by the Neckbeard Spotlight on hats. Either people really want to know about fedoras and dusters, or it’s a bunch of neckbeards who are obsessed with reading about themselves. I’m getting ahead of myself, though.
First off, the home page stats really should be analyzed apart and away from the individual posts. This page is where you can access every single blog post. This page is also not easily findable in google searches and the like, since it can’t be “tagged” or catagorized like a regular post. In this case, it’s not an accurate measurement of what’s hot and not. However, we shouldn’t ignore it; if it’s more convenient to access the home page to read new posts, but not necessarily easy to find by googling, then repeat and even *gasp* regular readers would be a likely explanation for any views. Since I have a pretty stout (with respect to total views, of course) stat for the home page, I can assume repeat readership. This is fantastic.
I do have to step back a bit though and think about ways I have publicized the blog, though. I’ve posted it to Facebook a few times and asked friends to read it (if they dared!), so I can’t assume all of those views are from people who have found my blog on their own. But, I do have stats about where I’m getting visitors from, which I’ll get to a little later. Without giving away the punchline about the referrers to this blog, a good bit of readership to the homepage is coming from Facebook, but not the majority. Even if all of the visitors to the homepage were initially referred to my blog through Facebook, which is a very conservative and possible outcome, they would have had to navigate on their own back to the homepage, since Facebook references don’t account for all homepage views. Ultimately, it points to a dedicated readership. Again, I rejoice.
The other two pages that should be analyzed away from the group are the “About” and “Merchandising” pages. While I would love to believe people found them on their own, I know better. When looking at my daily stats, views for these two pages tend not be isolated. They are often paired with other page views, pages which I find both isolated and with other page views. This means that people are finding those pages on their own, and then occasionally navigating to the “About” and “Merchandising” pages out of curiosity. I wish I had correlated stats for this to show what I am talking about, but you’ll just have to take my word for it, because I am too lazy to hunt through my stats for that shit.
Finally, one page that is a complete mystery and HAS to be removed from considerations is the “(unknown or deleted)” page. While it has, at some point, gotten 12 views, I don’t remember what it was. I can make a few guesses; it could be a page I deleted very early on that contained a list of places that delivered food in Charlotte, NC. It may be a post I made, realized that it wasn’t the caliber of post I wanted, and deleted it. Whatever the reason, the page is gone now, and so any speculations about what those 12 views means are about as useful as the page itself was before I deleted it.
Without those four pages, a few things become VERY clear:
1. Posts that have anything to do with neckbeards are POPULAR.
2. Posts with images tend to get viewed.
3. People like it when I talk about/make fun of Luke. He never gets a break.
4. Fatty McButterpants, Graphs, and other self-generated art and images are often viewed.
If you think about it a little bit, you can make a few inferences. The two conclusions I can make about posts that will garner views are:
A. Posts with pictures that have common search terms (neckbeard, the big bang theory, your mom, etc.)
B. Posts that are tagged and have common themes (Fatty McButterpants, Luke, Crap Reflections…)
Admittedly, there are some posts that defy either of these conclusions and get views, but it gives a pretty good idea of how to write future posts if my intent is to increase publicity. But how do I meld what I want to write about and what people want to read? How do my stats work for me, and show me what exactly people would like to see?
Now it comes time to go back to the stats themselves. The top post is, again, a neckbeard spotlight on trenchcoats. There are a lot of images, and easily searchable terms, so it’s not a hard post to find (google “neckbeard trenchcoat”, you see this post if you look down into the results). But, again, we have to consider other things, things that stats like these can obfuscate; maybe the trenchcoat post, like the USA being the top viewer, isn’t the what people want to read, really. Statistic items like number of pictures, popularity of posts in searches, and length are going to play a part in the total views, and a hard to find post with lots of views points to true readability. However, something that is probably the easiest as well as accurate metric to weigh these views against is length of time published.
For example, the top-viewed post has been published since March 19th, 2013. Using a neat website called DayCalc, I can find the number of days between then and now pretty easily; this post is 444 days old. Again, using the ratio from the countries comparison, the top-viewed post has a view factor of 1.85. That means this post is viewed, on average, about once or twice a day since it was published. That’s bitchin’ news (well, for a post about neckbeards’ creepy trenchcoats, this is really quite fantastic)! How do the rest of the posts stack up, though?
The results show that the trenchcoat post remains close to the top, but the fedora post is actually more popular. A few other posts are similarly popular, but they quickly taper off after that. An easier way to views this is graphically:
Each number in the graph corresponds to the ranked post in the list. What we can see is that the fedora post is roughly twice as popular as the trenchcoat post, which is roughly twice as popular as the brony post. Results of 4 through 10 (that is, “Why the show “The Big Bang Theory” should die” through “Ableism: You’re All Fucking Guilty” on the ranked list) are significant. After that, all other posts shouldn’t be considered. What this tells me is that people tend to read posts about neckbeards, “The Big Bang Theory”, and anything really truly angry or inflammatory. And if you think about it, it makes some sense. Neckbeards like the internet, and what they love to do is go read pages and then bash them. I’ve tapped into a great source of views!
It doesn’t, however, mean that topics other than this are forever cursed and that I should abandon any future posts with related content. It just means that if I want to maximize my views, I should focus more on topics that do garner more attention. Even more than that, it means that the low-view posts don’t give me enough information. It may be popular content, but they may be tagged incorrectly or not tagged at all. It may be tough to find them in search results. So how do I increase the likelihood of views on these posts?
An easy and simple solution is to re-evaluate tagging. An even easier and simpler solution, though, is to leave it be and keep on writing new content. Remember, a view is a view is a view, regardless of being from a new post or old. And I have to think about my time in terms of expenditure, since I am a graduate student and my resources are at a premium. Writing this blog is a hobby for me that is completely free. So while I can sit down and re-work posts and tags and such, I already have posts that are garnering views in place without doing anything. It wouldn’t be fun at all, either, to go through and fix posts for more views, which I don’t make money from anyway. I may make an extra 100 views by re-tooling things around, but I also may end up not getting any extra views from this. In the end, I have a system that works, both for me and the readership I do have.
This should probably answer the question of what people are reading, which brings me to the last topic: How are people getting here?
There are two interesting sets of data WordPress provides to answer this question. The first set of data shows references, which I’ve already mentioned.
As you can see, I got most of my views from search engines, specifically from Google Web and Image search. I do far less well with Bing and Yahoo, which isn’t exactly sending me to bed and crying myself to sleep. Facebook does a good job for me, but this is also where I publicize my blog specifically, so I expect references from there to be high. Reddit gave me 34 views, which I assume was the result of some gross neckbeard posting my blog there and crying about how I don’t understand his “chivalry” and “euphoria”. As far as I am concerned, someone could have called me an awful stuck-up bitch (marginally accurate it may be, though) and I won’t care, because they clicked on it and came here. Again, a view is a view is a view. I am not too proud to take a view from someone who hates me… mostly because the stats counter takes them for me.
After that, it’s just some residual hits from foreign Googles. That’s it. That’s how I got my views. No analysis, no hand-waving, no magical numbers-crunching to see what it “really” means.
The second set of data WordPress provides that goes hand-in-hand with the references list is the search terms list, which I find to be far more interesting. This is what people are googling, or binging, or, God forbid, yahooing that brings ’em on over to me.
Here are some stats we can work with now. We can’t do anything with the unknown search terms; privacy settings prevent WordPress from knowing those, even if I have them up on the list. But we do have a lot of search terms here to play with. Let’s do some counting.
After some quick totals, the graph above is what emerges. The values at which they appear are totaled using the number of times they appear along with the number of times they are searched for. After seeing which posts are the most popular, seeing “neckbeard” and “Fedora” at the top is not a shocker. Other combinations that are related to neckbeards are also a bit higher than other terms. Words like “of”, “and”, and “about” were ignored, as well as the two links that appear (I chalk those up to people accidentally putting the link into the search bar, or putting it into the address bar and clicking “search” rather than “go”). I also kept certain phrases together, like “the big bang theory” and “why small talk should be banned”. These are titles, so the chances that the individual words helped them find my blog are low, and probably had to be used altogether to garner the results the reader wanted. Keeping certain results apart, though, is ultimately silly; adding an s or misspelling something slightly wouldn’t (and as it shows here, apparently doesn’t) prevent a reader from finding my blog. So I re-evaluated the results by going through and combining counts. Dedora, although funny, is now combined with fedora. Trench coat, trench coats, trenchcoat, trenchcoats, and yes, even trenchcosts are lumped together; there is a trench that was part of “babies with trench mouth”, which was initially kept separated in anticipation of this result. The new plot is now easier to read and interpret.
The nice thing is that this combining did not affect the main rankings. Neckbeards are still at the top (it’s probably the only place you’ll probably ever see them winning, though), followed by fedora and the trenchcoat varieties. Going back to statistical significance, it’s clear to see what can get a reader here by itself, what can get them here if lumped with another word, and what may get a reader here if used exactly as it is, or with a very popular search term. In these cases, it’s not clear what is exactly significant, but, since we have my blog to look at, we can make more assumptions about why these work as opposed to why people in certain countries read my blog (or don’t). Words like “facts” and “autistic” are low on the graph and were paired with “neckbeard”, so I can assume they ultimately have nothing to do with finding the blog, but “why small talk should be banned” is a title of a post on this blog, so it’s directly linking, and thus significant to these results. If people are googling the exact titles to posts, it means they are looking for something again. That’s important to understanding how people got here; even if they don’t have the links, they’re using search engines to find my blog on their own, which directly affects my search rank when people google something related to that search term. For example, someone googled “babies with trench mouth”, then this blog popped up in a search, and they clicked on it, giving me traffic. Unfortunately I don’t have trenchy-mouthed babies here, but, again, a view is a view is a view, and that’s all that matters. The blog popped up in their results because trench is a popular word that people use to find this blog. If enough people, looking for my blog to show someone and forgetting the link, google a title or common words, somewhere else someone is googling something with “beard” or “small talk” or “sleepwalker” and then finding my blog too. This kind of intersectionality across a search engine should come as great news for anyone trying to get their blog off the ground. It means that by using common words in tagging, or repeating themes in a blog by using slang or popular words, a writer can expect views to start to rise.
So that pretty much sums up the last 15 months; a few graphs and some guesses as to why this blog got to 2500 views. When will I re-analyze my blog? At 5000 views? 10,000? How about an order of magnitude, which would mean at 25,000 views? I suppose I’ll wait until then. But… how long do I have to wait? Will it be another 15 months? Less than that? More than that?
And thus, I come to the last bit of stats.
Stats are extremely useful, for two reasons that are really only one: first, it can give you an idea of trends over time, and second, it can give you a possible projection for the future. These reasons are a single reason because we crank stats to understand how we can optimize our future and possibly change that projection in a positive way. It’s all about predicting tomorrow, albeit in a dry and unexciting way.
So the last stats fruit I have is the distribution of views over time. I’m going to use the most updated ones, instead of those two days ago, because in statistics you MUST use the most up-to-date data. Your numbers are already inaccurate if you aren’t doing so (not entirely true, but you get the picture).
The light blue are views, and the dark blue is unique visitors. We’ll ignore visitors for now, and focus on views.
So far, highest views month was this past May. June already has a strong start, but, based on this, will June overshadow May?
May had 463 views, which is nearly a fifth of view share. June has 68, after 5 3/4 days. Let’s be conservative and round it up to 6 days. We can now easily find the views per day for each month; May has nearly 15 views a day (if we assume these are evenly distributed views), and June has a little more than 11. Considering May has 31 days and June has 30, we can make a prediction that June will not see the same number of views as May, although the value will probably be high. This isn’t the FULL picture, but this gives us a good starting place.
If we look back at last May, in 2013, we notice that June had higher views than May did, but only by a few. June had 115, and May had 97. After June, the views drop, and stay low until October, when they start climbing again. If we assume a similar trend this year, then we can say that May/June is the peak before the summer “drought”. We may even go as far to start preventing it, by publicizing the blog more. Again, there is more to consider, but now we’re getting somewhere.
What about another metric we can use to analyze these, like, say, posts published in a month? It’s what brings the views in to begin with, right? So if we go back and add these in, we get the following information:
The numbers above the columns indicate the posts in a month. Two things are noticeable here. The first is that there is a month-to-month delay between posts and views, or, expressed simply, it takes a month for a post or posts to bring in viewers. From February 2013 to March 2013, those two posts increased viewership a little. But the 9 posts in March had a more dramatic effect. When I only posted twice in April, the views stayed steady through the next month, and increased in June after posting three times in May. After 1 post in June, I didn’t post again until November, and the effect that had can be clearly seen as a low over a few months. Once I started posting again, the views increased, but something else happened too; older posts became popular. This is indicated by the rise around October. So, with only these numbers in front of us, we can say that posting in a month affects the next month, and that posting something popular will create a momentum over time. If I were pressed for time in a month, but wanted to ensure views the next month, I would come up with some kind of content just to have something there, but if I have the time, I should sit down and create a possibly popular piece, using the information I have already discussed in this post.
I haven’t quite beaten my numbers to death yet, though. There is still more to consider. First off, correlation does not imply causation. If you’re a lay person, you’ve probably heard the phrase but not quite understood what it really means. It means that if I go outside a lot, and I get sunburnt a lot, that the two are positively correlated, and a high occurance of one means a high occurance of another. But it absolutely, positively, undoubtingly does not mean that I get sunburnt BECAUSE I go outside. Correlation can be useful for finding trends, but unless you directly observe the cause of something, you ultimately cannot make any statements about it. This is why a lot of biological and medical studies will say things like “We found an increased likelihood of X when patients were Y” instead of directly saying “Y causes X”.
For example, a million things can cause cancer, but many of the cancer factors, especially the main ones, like family history, smoking, obesity, exposure to radiation, etc., can all point to lifestyle decisions or socioeconomic factors. Your dad was a smoker, and so are you, so you both get lung cancer, and it gets written down that smoking and family history play a role. But what often doesn’t get written down, or at least pinpointed in cases like this, are your environment or job. If both of you had jobs where the air quality sucks, it may be the factor that got you and your pops your new diagnosis, but since you both smoked, even if you didn’t smoke much, it blinds people to what may have really caused the cancer in the first place. Alternately, there are studies coming out all the time saying this or that has shown a loss of weight, a decrease of cancer, or something other miraculous, but when you look closely at the studies, they’re often with people who have a sincere interest in increasing their chances of winning whatever their challenge is; it’s why they volunteered for the study in the first place! When a person wants something, even in a highly controlled study, it can throw off the results in a positive way, because that person may be doing just enough tiny little other things that statistically add up. Weight loss studies are hugely skewed because of this. Miracle weight loss drugs are often championed by personal testimonies of those who didn’t work out, or change their diet and managed to lose weight anyway, but if you are trying a weight loss regime, and you feel confident it will work, you’re probably less stressed, more energetic (which will lead to more movement), sleeping more restfully and enjoying a higher quality of life; those things can and will impact weight loss.
Ultimately, you can’t trust when two things that are related change in the same or opposite ways. That’s what “correlation does not imply causation” really means. Now, looking back at my stats, another reason for the low dip last summer should be clear. The big hint is this: it was the summer!
Over the summer I stopped writing so much because it was nice outside, so I spent my free time going out more often. That wasn’t an isolated event; everyone goes outside more in the summer. Instead of thinking about my views in terms of when stuff gets posted or how well the posts are received by viewers, it should be considered as a result of several things, one of which is that I wasn’t around writing posts, and that more people were also outside, instead of on their computers reading posts I didn’t write. The views pick back up in October, which is when the weather starts getting cold again, and thus views go back up as well. How nice the weather is and views are indirectly correlated, as well as my tendency to write them. I can change how much I write, so I can de-correlate my writing frequency from how nice it is outside, which is a way of saying “suck it, nature!”. I can’t change the weather, and thus other people’s choices to enjoy the sunshine though. The good news is that I can fight this, because, remember, correlation does not imply causation. The weather isn’t causing people to not read my blog, it just so happens to coincide; in short, I don’t have to find the reason why views dropped, I just have to find positive correlations to views, and then employ them. Like earlier, one way I can do it is to just write more. Another way is to publicize on Facebook or other social media.
Really, how well does publicizing on Facebook work?
The light blue numbers over the posts is the number of times I publicized this blog on Facebook. When I first started out, I publicized a lot, but my blog was starting out; I wanted to increase my stats quickly. Nowadays I don’t need to publicize all that much for my stats to stay high, but I still do it occasionally. And we can make another correlation, one that is more clearly just a correlation and not a causation: a higher number of Facebook posts is related positively to the number of views for the month after. Put simply, past postings on Facebook are followed by a month with higher views. Since there is so much going on, it’s not something we can turn around and quickly state “Facebook posts increase your views the next month”. I know I’m getting references from Facebook, so that’s not in question, but when they happen is a different matter entirely, since that post will stay on your Facebook until something causes it to be removed. The smart money is on the views being located in the same month as the post on Facebook being made. While I can’t know for sure, I do know that posts on Facebook in general are more popular and get more publicity (thanks to THE FEED) right after they’re posted as opposed to a week or a month later. So why the delay?
One explanation could be the generation of repeat readers and the reminder to check the blog regularly. If something doesn’t need to be checked every day, you often forget about it, like a throwaway email account or a webcomic the artist has gone on hiatus from. If there is new content, you wouldn’t notice it immediately… unless you were given a reminder. So a post advertising a blog you actually want to read but doesn’t update on a schedule or extremely often can convince someone to start regularly checking the content again. This may be the effect showing up in the data.
Another reason is that I publicize on Facebook when I feel like I’ve either got some great content or been somewhat prolific. Both of these things seemed to have an effect on views for the next month, so instead of considering publicizing on Facebook as a reason for increased views the following month, we can consider it a result of more or better contenting, which will get me good views regardless. We still don’t know what the reason for increases are, and we don’t know when they will exactly happen; it’s still just a stats game. However, as I’ve said before, a view is a view is a, well, you get the point. Will publicizing on Facebook increase views? Yes. Will it do it for June? Maybe, maybe not, but I do know that it will increase them at some point after I put the link up, and in the long run, that’s all that matters.
I can probably increase June’s stats and reach a goal to beat May’s views by just writing more, and publicizing my blog wherever I can. I can also try to tag posts, as said before, with useful and common words that will show up in searches. I can continue to write Neckbeard Spotlights, and come up with new theme posts that will attract more readers, while improving the number of image media in future posts. I know that Australia is the most avid reader in terms of population percent, and while I’m not sure how to take advantage of that yet, it’s something I should keep in mind (BTW HAVE I SAID AUSTRALIA IS GREAT? IT TOTALLY IS! Please read my blog more.).
But really, do I even need to worry about it? I have an upward trend right now, and while month-to-month it can jump up or down, overall I have an increase over time. So how long do I have to wait until I see 25,000 views?
Now comes the really fun part, at least for me, because after you have considered all of the stats, you get to actually look at the data, come to a conclusion, and then predict, based on what you have learned and can observe, what will happen. This is what science is all about!
What conclusion do I come to? First, I’m going to conclude that I have an upward overall trend. I’m also going to conclude that I can fit a trend; that is, I can fit a functional form to my data that will allow me to predict when I will have 25,000 views.
First, I tried a linear trend, which is the simplest functional form. When I plotted my data to make this trend, I eliminated February 2013 (since it wasn’t a whole month) and this month (since it’s just begun).
I’ve got an R^2 of 0.6665, which isn’t great, but it’ll give me a quick and initial idea of where my views are headed. Since these peaks are views each month, and not totaled, I’ll need to perform some simple integration on the linear function to find the area under the curve, or in this case, find the point at which the area equals 25,000. If you managed to get to differential equations in math, you’ll recognize the technique. My integral of y should be equal to 25,000, so when I perform the integral on the right side of the equation, I get a polynomial that can be solved pretty quickly for x (which, by the way, if you’re following along, I did not do by hand. There’s a reason Wolfram Alpha and Matlab exists 🙂 ).
When I do all of this, I get x = 47.12. Since my x is in months, I now know, if the views progress in a linear fashion and I don’t do anything differently, that in 4 years I will reach 25,000 posts. That’s… awhile to wait.
Going back to the fit I made, I can do better than R^2 = 0.6665. By changing the functional form, I can increase the fit… or decrease it. Playing around with fits, I find that a polynomial fit has the highest fit factor, an R^2 of 0.854.
That’s a lot better, and it even looks like a better fit. When you redo all of the math, you get a prospectus of 32 months, or 2 years, 8 months. That’s still far away, but a much better wait time than the 4 years.
At this point, it probably sounds like this is it. However, there is something else I can do. For starters, I have already studied the correlation factors for this low period last summer, so it’s something I can combat. More than that, I can assume from now on that I wouldn’t get a low period like this again. If it’s a statistical anomaly, I have the option, nay, the RESPONSIBILITY! to remove it. So I did.
When I remove the first 6 months, I have a power trend, with a R^2 of 0.955. Redo the math again, and I get a prediction of 29.26 months, or a little less than 2 1/2 years away.
So what do these fits really mean? Which one should I trust? Well, to be honest, all of them are important. Functional models are tricky. Simple models are the best, because they make the least assumptions about the data presented, but they are often riddled with fit errors. Saying the sun will shine every day is a simple model, one that is extremely easy to rely on, but sometimes it rains, making that model fail often. Trying to fit in that rain percentage can increase the accuracy, but you have to make an assumption to get there, meaning you assume it will rain again, and with some kind of frequency close to that of the past rain events. The sun not shining when you say it will is a small mistake, because the sun typically comes out, but rain is much more infrequent, so the mistake is far larger. Eventually, in trying to make the end-all weather forecast model, you manage to fit all past data perfectly, and so your function describes when, where, and how much in lurid detail. And then it immediately fails, because the weather is out of your control, and sometimes shit happens. Much like the weather, my views on this blog are more or less out of my control. I can do things like measure the data, make predictions, understand what is getting views, find out where they are coming from and pray to God that Australia doesn’t suffer some massive apocalyptic-size internet outage (Dear God, please let Road Warrior and Mad Max stay movies, and not future histories). But I can’t make people (besides Luke) read my blog.
So, in the end, I have to take all models seriously, and instead create a range. I should reasonably expect 25,000 views within a period 2 1/2 to 4 years from now, assuming all conditions stay relatively constant.
So what next? I’ve analyzed some data and done some thinking, but what else can I do to the numbers I’ve nearly beaten to death? Bar brawls don’t end when someone walks away; there’s typically the bloody heap of unconscious tavern patrons and no one left standing. Like this imaginary saloon in my head, I’ve run out of drunks to slap around. I’ll just have to wait a few years and see what comes down the views road. Until then, please keep reading the highly offensive and arrogant swill I post on here. Stay classy, Aussies.