Setting up HDFS/Spark on EC2 Instances to Explore the Outbrain Dataset

On Kaggle, Outbrain is sponsoring a competition to build a model to recommend content to it’s users. One of the files of the released dataset is a behemoth 100GB, 2 billion row file of page view data.

I’m not sure what kind of machine you’re operating, but if it’s anything like mine this file is way too large to store or  work with. What we need is a distributed storage and compute engine, which is where a cluster of EC2 instances with Hadoop and Spark come into the picture.

How do we get this setup up and running with the relevant files downloaded? Read and follow along if interested.

Spinning up the Cluster

First things first, we’re going to use Insight Data Engineering’s Pegasus repo to help set up our EC2 cluster and install the necessary technologies. After you git clone the repository to your local machine, it’s time to create YAML files to set up your cluster.

We’ll need two files, one for the master node and one for the worker nodes. In my repo for the competition, in the peg folder see the files master.yml and workers.yml for examples. Some important parameters specified in the files are the  number of nodes and instance type (I used 5 m4.2xlarge instances with a an EBS volume of 250 GB just to be safe).

Once we’ve got our YAML files good and ready, it’s time to spin up our instances! I’ve created a shell script that spins up the instances and installs Hadoop and Spark in one fell swoop. You can check in out in the automation folder, To run it, we type the following into the terminal from the project root:

./automation/ outbrain

You’ll notice that the script takes a parameter, which is the name Pegasus assigns to the cluster. I decided to call, but feel free to call yours whatever you choose!

Starting up the cluster and installing Hadoop and Spark can take 10-15 minutes so feel free to use the restroom or refill your coffee mug at this point.

A small quirk of using Pegasus is that in order to use a technology, after installing it you also have to start it. To do that, we’ll use the small script in the automation folder. Again same as with the spin up script, the cluster name is a CLA you include when running the script. Just helps makes these more reusable for other projects.

To be explicit: ./automation/ outbrain

And with that, we’ve concluded the portion of this devOps setup that we’re going to do “remotely” through Pegasus. Going forward we’re going to SSH directly into the master node to download the page_views.csv datafile, unzip it, and place it in HDFS.

Downloading the Datafile

Typically the wget command is used to retrieve files from the internet from the command line. At first attempt, I was confused why pointing wget to the file download link wasn’t working. Then I realized it likely had something to do with needed to log into Kaggle before getting access to the link.

Luckily, there’s a way to pass in login credentials with wget, and save them in a file that can be used for subsequent wget commands. The shell script shows how this is done for the Outbrain competition files.

A couple of things to point out about this script:

First, it takes your Kaggle username and password as arguments. Second, you might be wondering how I knew to post the username and password as values to the keys ‘userName’ and ‘Password’ respectively. To see how I found the keys Kaggle uses, go to the Kaggle login page (go Incognito mode if you are already logged in to Kaggle).

On the login page (assuming you are using Chrome) right click on the text box you would type your username into and click ‘Inspect’. A screen should pop up as shown int he image below.

Screen Shot 2016-11-28 at 12.19.18 PM.png

As you can see in the image, this reveals that the id of this page element is “userName”. The same process was followed to get “Password”.

Since I didn’t want to include the public DNS of my cluster’s master node in this tutorial, I sourced the value from an address file. Run the script like so:

./automation/download_file/sh kaggle_username kaggle_password

with your username and password and the file should start downloading and eventually be placed onto HDFS! From there you’ll be ready to follow along with tutorials like this excellent one that explore and make use of the full page_views.csv file included in this competition.

Good luck!

Holding Your Hand Like a Small Child Through A Neural Network (Part II)

For Part I of this riveting series, click here.

In Part I, we went through each calculation by hand of a forward and backward pass through a simple single-layer neural network.

To start Part II, we’re going to do the same for the second pass through. My hope is after doing this a second time trends will emerge and we will be able to understand how the network’s weights end up where they do by 100,000th pass.

Since the first pass was called iteration #0, we begin with iteration #1:

---------ITERATION #1-------------
 [[0 0 1]
 [0 1 1]
 [1 0 1]
 [1 1 1]]

 [[ 0.67423821]

dot product results:
 [ 0.26954282]

l1 probability predictions (sigmoid):
 [[ 0.40018475]
 [ 0.32312967]
 [ 0.56698066]
 [ 0.48370881]]

Compared to the first pass, the first weight is larger and the second two weights got smaller. We’ll see if these updated weights cause less error in our predictions (Spoiler: They will).
Although you should be able to do dot products in your sleep at this point since you followed along so closely with Part I of the series, I’ll walk us through the dot product again:

(0 * .674) + (0 * -.335) + (1 * -.404) = -.4047
(0 * .674) + (1 * -.335) + (1 * -.404) = -.7394
(1 * .674) + (0 * -.335) + (1 * -.404) = .2695
(1 * .674) + (1 * -.335) + (1 * -.404) = -.0652

Great. Now we run the results through the sigmoid function to generate probability predictions (shown as “l1 probability predictions (sigmoid) above).

For nostalgia’s sake, here were our predictions from the previous pass:

OLD l1 probability predictions (sigmoid):
 [[ 0.36672394]
 [ 0.27408027]
 [ 0.46173529]
 [ 0.35868411]]

If you compare the old predictions with the new ones, you’ll notice that they simply all went up, meaning the model thinks they are more likely to be ones than before.

In terms of error, it hasn’t improved much from the last run.

OLD l1_error:
 [ 0.53826471]
 [ 0.64131589]]
NEW l1_error:
 [ 0.43301934]
 [ 0.51629119]]

Calculating the sum of the absolute value of the four errors, it did decrease from 1.82 to 1.67. So there was improvement!

Unlike in Part I, I’m not going to dive into the details of how taking the derivative of the sigmoid at the spot of the probability prediction, multiplying the result by the errors, and then taking the dot product of the result with the inputs leads to updating the weights in a way that will reduce prediction error… but instead just skip to the updated weights:

pre-update weights:
 [[ 0.67423821]

post-update weights:
 [[ 0.90948611]

As we should come to expect, the weight on the first input got larger and the other two got smaller.

Let’s take a look at how the sum of the errors decreases over the first 100 iterations:


Now the first 1000 iterations:


Seems like we hit an “elbow point” around the 100th iteration. Let’s see how this same graph looks over 10,000 iterations:


Even more dramatic. So much of the effort (computational resources for those who don’t like to personify their processors) goes towards decreasing the final error by tiny, tiny amounts.

Last graph, lets see where we end up after 100,000 iterations:


The value of the error after 10,000 iterations is 0.03182. After 100,000 it is 0.00995, so the error is certainly still decreasing. Though from the graph above, we can see it is easy to make the argument that the additional training loops are not worth it since we get most of the way there from just a few hundred iterations.

Where did the weights end up? Great question! Let’s have a peek:

weights (after 100,000 iterations):
 [[ 12.0087]

Not surprisingly, the size of the first weight has grown to be the largest. What does, in fact, surprise me is the relatively large weight on the third input (large weights, even if negative, still have an impact on the predictions.)

One thing to note is that the inputs corresponding to the third weight are all ones, making it effectively like adding a bias unit to the model. Viewed in that way, it is less surprising to see the large-ish third weight.

One more time, let’s run through the predictions produced from these weights. We start with the dot product of the weights and the input:

dot product results:
(0 * 12.00) + (0 * -.20) + (1 * -5.8) = -5.8
(0 * 12.00) + (1 * -.20) + (1 * -5.8) = -6.0
(1 * 12.00) + (0 * -.20) + (1 * -5.8) = 6.2
(1 * 12.00) + (1 * -.20) + (1 * -5.8) = 6.0

Those results make an overwhelming amount of sense. Let’s apply the sigmoid function:

l1 probability prediction (sigmoid):
1/(1+e^-(-5.8)) = 0.003
1/(1+e^-(-6.0)) = 0.002
1/(1+e^-(6.2)) = 0.998
1/(1+e^-(6.0)) = 0.997

Hopefully that makes it a little more obvious why the error is so low. Only took 100,000 tries🙂

Jupyter notebook for this article on GitHub.

Stay tuned next time when we add another layer and dive into the details of a more legit backprop example!

Holding Your Hand Like a Small Child Through A Neural Network (Part I)

For those who do not get the reference in the title: Wedding Crashers.

For those trying to deepen their understanding of neural nets, IAmTrask’s “A Neural Network in 11 lines of Python” is a staple piece. While it does a good job–a great job even–of helping people understand neural nets better, it still takes significant effort on the reader’s part to truly follow along.

My goal is to do more of the work for you and make it even easier. (Note: You still have to exert mental effort if you actually want to learn this stuff, no one can replace that process for you.) However I will try to make it as easy as possible. How will I do that? Primarily by taking his code and printing things, printing all the things. And renaming some of the variables to clearer names. I’ll do that too.

Link to my code: what I call the Annotated 11 line Neural Network.

First, let’s take a look at the inputs for out neural network, and the output we are trying to train it to predict:

| Inputs | Outputs |
|  0,0,1 |       0 |
|  0,1,1 |       0 |
|  1,0,1 |       1 |
|  1,1,1 |       1 |

Those are our inputs. As Mr. Trask points out in his article, notice the first column of the input data corresponds perfectly to the output. This does make this “classification task” trivial since there’s a perfect correlation between one of the inputs and the output, but that doesn’t mean we can’t learn from this example. It just means we should expect the weight corresponding to the first input column to be very large at the end of training. We’ll have to wait and see what the weights on the other inputs end up being.

In Mr. trask’s 1-layer NN, the weights are held in the variable syn0 (synapse 0). Read about the brain to learn why he calls them synapses. I’m going to refer to them as the weights, however.  Notice that we initialize the weights with random numbers that are supposed to have mean 0.

Let’s take a look at the initial values of the weights:

 [[ 0.39293837 ]

We see that they, in fact, do not have a mean of zero. Oh well, c’est la vie, the average won’t come out to be exactly zero every time.

Generating Our First Predictions

Let’s view the output of the NN variable-by-variable, iteration-by-iteration. We start with iteration #0.

---------ITERATION #0-------------
 [[0 0 1]
 [0 1 1]
 [1 0 1]
 [1 1 1]]

Those are our inputs, same as from the chart above. They represent four training examples.

The first calculation we perform is the dot product of the inputs and the weights. I’m a crazy person (did I mention that?) so I’m going to follow along and perform this calculation by hand.

We’ve got a 4×3 matrix  (the inputs) and a 3×1 matrix (the weights), so the result of the matrix multiplication will be a 4×1 matrix.

We have:

(0 * .3929) + (0 * -.4277) + (1 * -.5463) = -.5463

(0 * .3929) + (1 * -.4277) + (1 * -.5463) = -.9740

(1 * .3929) + (0 * -.4277) + (1 * -.5463) = -.1534

(1 * .3929) + (1 * -.4277) + (1 * -.5463) = -.5811

I’m sorry I can’t create fancy graphics to show why those are the calculations you perform for this dot product. If you’re actually following along with this article, I trust you’ll figure it out. Read about matrix multiplication if you need more background.

Okay! We’ve got a 4×1 matrix of dot product results and if you’re like me, you probably have no idea why we got to where we’ve gotten, and where we’re going with this. Have patience for a couple more steps and I promise I’ll guide us to a reasonable “mini-result” and explain what just happened.

The next step according to the code is to send the four values through the sigmoid function. The purpose of this is to convert the raw numbers into probabilities (values between 0 and 1). This is the same step logistic regression takes to provide its classification probabilities.


Element-wise, as it’s called in the world of matrix operations, we apply to the sigmoid function to each of the four results we got from the matrix multiplication*. Large values should be transformed close to 1. Large negative values should be transformed to something close to 0. And numbers in between should take on a value in between 0 and 1!

*I’m using the terms matrix multiplication and dot product interchangeably here.

Although I calculated the results of applying the sigmoid function “manually” in Excel, I’ll defer to the code results for this one:

dot product results:

probability predictions (sigmoid):
 [[ 0.36672394]
 [ 0.27408027]
 [ 0.46173529]
 [ 0.35868411]]

So we take the results of the dot product (of the initial inputs and weights) and send them through the sigmoid function. The result is the “mini-result” I promised earlier and represents the model’s first predictions.

To be overly explicit, if you take the first dot product result, -.5463 and input it as the ‘x’ in the sigmoid function, the output is 0.3667.

This means that the neural network’s first “guesses” are that the first input has a 36.67% chance of being a 1. The second input has a 27% chance, the third a 46.17% chance, and the final and fourth input a 35.86% chance.

All of our dot product results were negative, so it makes sense that all of our predictions were under 50% (a dot product result of 0 would correspond to a 50% prediction, meaning the model has absolutely no idea whether to guess 0 or 1).

To provide some context, the sigmoid is far from the only function we could use to transform the dot product into probabilities, though it is the one with the nicest mathematical properties, namely that it is differentiable and its derivative, as we’ll see later, is mind-numbingly simple.

Calculator Error and Updating Weights

We’ve generated our first predictions. Some were right, some were wrong. Where do we go from here? As I like to say, we didn’t get this far just to get this far. We push forward.

The next step is to see how wrong our predictions were. Before your mind thinks of crazy, complicated ways to do that, I’ll tell you the (simple) answer. Subtraction.


 [ 0.53826471]
 [ 0.64131589]]

The equation to get l1_error is y – probability predictions. So for the first value it is: 0 – .3667 = -.3667. Simple, right?

Unfortunately, it’s going to get a little more complicated from here. But I’ll tell you upfront what our goals are so what we do makes a little more sense.

What we’re trying to do is update the weights so that the next time we make predictions, there is less error.

The first step for this is weighting the l1_error by how confident we were in our guess. Predictions close to 0 or 1 will have small update weights, and predictions closer to 0.5 will get updated more heavily. The mechanism we use to come up with these weights is the derivative of the sigmoid.

sigmoid derivative (update weight):
 [[ 0.23223749]
 [ 0.19896028]
 [ 0.24853581]
 [ 0.21876263]]

Since all of our l1 predictions were relatively unconfident, the update weights are relatively large. The most confident prediction was that the second training example was not a one (only a 19.89% chance of being a one), so notice that it has the smallest update weight.

The next step is to multiply the l1_errors by these weights, which gives us the following result that Mr Trask calls the l1_delta:

l1 delta (weighted error):
 [ 0.13377806]
 [ 0.14752178]]

Now we’ve reached the final step. We are ready to update the weights. Or at least I am.

We update the weights by adding the dot product of the input values and the  l1_deltas.

Let’s go through this matrix multiplication manually like we did before.

The input values are a 4×3 matrix. The l1_deltas are a 4×1 matrix. In order to take the dot product, we need the No. of columns in the first matrix to equal the No. rows in the second. To make that happen, we take the transpose of the input matrix, making it a 3×4 matrix.

Original inputs:
 [[0 0 1]
 [0 1 1]
 [1 0 1]
 [1 1 1]]
transpose of input weights:
 [[0 0 1 1]
 [0 1 0 1]
 [1 1 1 1]]

(To take the transpose of a matrix, you flip it over the vertical axis and rotate it counter-clockwise.)

We’re multiplying a 3×4 matrix by a 4×1, so we should end up with a 3×1 matrix. This makes sense since we have 3 weights we need to update. Let’s begin the calculations:

(0 * -.085) + (0 * -.055) + (1 * .134) + (1 * .148) = 0.282
(0 * -.085) + (1 * -.055) + (0 * .134) + (1 * .148) = 0.093
(1 * -.085) + (1 * -.055) + (1 * .134) + (1 * .148) = 0.142

Okay! The first row must correspond to the update to the first weight, the second row to the second weight, etc. Unsurprisingly, the first weight (corresponding to the first column of inputs that is perfectly correlated with the output) gets updated the most, but let’s better understand why that is.

The first two l1 deltas are negative and the second two are positive. This is because the first two training examples have a true value of 0, and even though our guess was that they were more likely 0 than 1, we weren’t 100% sure. The more we move that guess towards 0, the better the guess will be. The converse logic holds true for the third and fourth inputs which have a true value of 1.

So what this operation does, in a very elegant way, is reward the weights by how accurate their corresponding input column is to the output. There is a penalty applied to a weight if an input contains a 1 when the true value is 0. Inputs that correctly have a 0 don’t get penalized because 0 * the penalty is 0.

pre-update weights:
 [[ 0.39293837]

post-update weights:
 [[ 0.67423821]

With this process, we go from the original weights to the updated weights. With these updated weights we start the process over again, but stay tuned for next time where we’ll see what happens in the second iteration!

View my handy Jupyter Notebook here.

Using networkx to Make Friend Recs

Two things that all the cool kids are doing these days: 1) making social graphs and 2) recommending things to people who never asked. While I never claimed to be cool or anything, let’s try these things ourselves to see what the fuss is about and maybe learn something along the way.


For this project we’re going to be using the python networkx graphing library. The main idea is to organize connections between individuals in a graph-structure where people = nodes, and connections between them = edges. Nodes and edges, edges and nodes. Got it? Great.

The standard procedure is to import networkx like so:

import networkx as nx


The input data we’ll be using a small file with data in two columns. The data in the columns represent people, and if two people are in the same row it means they are connected. This is the network we’ll be using for the exercise.

Screen Shot 2016-03-08 at 7.05.21 PM.png

Loading the Dataset

To load the dataset, we’ll first use Pandas to put it in a DataFrame object. From there, we’ll use networkx’s from_pandas_dataframe function to turn it into a graph.

# Load the dataset into pandas and instantiate graph object
filename = 'simple.csv'
df = pd.read_csv(filename)
G = nx.from_pandas_dataframe(df, 'Friend A', 'Friend B')

From the code above, you can see the variable named G is the graph object.

Drawing the Graph

Drawing a full network graph is highly discouraged for large networks. It computationally expensive, and the result is often more of a clusterf*&$ than a useful or aesthetically pleasing image.

Since we’re working with a tiny network, we will draw ours using primarily nx.draw function.

pos = nx.spring_layout(G)
nx.draw(G, pos)
nx.draw_networkx_labels(G, pos)


Viola! A lovely looking graph indeed. We’ve got 8 users with 9 connections among them. We can simply eyeball these values with our network, but for larger networks you can use the number_of_nodes and number_of_edges functions.

G.number_of_nodes() # 8

G.number_of_edges() # 9

Note that it is considered acceptable to draw subgraphs of a graph, especially to highlight a particular user’s connections or a well-defined cluster.

A Recommendation Algorithm

Enough fooling around, let’s get down to business. Our business is making friend recommendations, and cousin, business-is-a-boomin’.

There are many ways, or more specifically, many metrics one can use to judge the importance of members of a network. There’s betweenness centrality, eigenvector centrailty, and a whole host of other centralities listed on that Wikipedia page.

Feel free to dive into the algorithms and mathematical details of those, you over-achiever you. What our algorithm is going to do is more simple. To generate a friend recommendation for a user, we will do three steps:

  1. Find all of the friends-of-friends of a user
  2. Remove friends-of-friends that the user is already friends with
  3. Recommend the friend-of-friend with the most connections

That’s it! It’s so simple! In theory at least, coding it (in a nice way) is a little more complicated. Here was my implementation (borrowed in part from this repository):

First, finding friends of friends:

Screen Shot 2016-03-08 at 7.54.15 PM.png

We start by calling a function that returns a list of friends by using the neighbors() function:

Screen Shot 2016-03-08 at 7.55.59 PM

Then for each friend, we call the same neighbors() function and add to them to a set that comprises all of the friends-of-friends for a user.

Screen Shot 2016-03-08 at 7.56.14 PM

Next, we check if there’s a connection between the user and every friend of friend with the nx.has_edge() function. If there’s a connection we add them to the “remove set” since we don’t want to recommend anyone the user is already friends with.

Python sets have a handy method available for this take, the difference_update(). As the handy Python documentation explains:

s.difference_update(t) # returns set s after removing elements found in t

Always read the docs, folks. Don’t pay someone $99 to read them for you.

You’ll notice the difference update between the two sets in line 74 of the code pasted above.

Anyways, now we’ve got all the friends-of-friends that the user is not friends with. Time to recommend the friend with the most connections. We’ll do that by sorting the list of friends-of-friends by the degree of each user. Here’s the function that does that:

Screen Shot 2016-03-08 at 8.07.16 PM

In this code, I returned the top five highest-degree users to generate five recommendations, since we’re working with a considerably smaller network for this example, I’m going to alter the code to output just one.

Recommending a Friend for User # 6

User #6 wants a friend! Let’s use our algorithm to algorithmically recommend him or her one. Look at the graph above, who do you think should be recommended to User 6?

>> paulsingman$ python 6
Friend recommendations for user 6:  2

Alright! The algorithm worked as expected! User 6 has three friends-of-friends in the network, User 4, User 8, and User 2.

User 4 has a degree of 2, User 8 has a degree of 1, and User 2 has the highest degree of all at 3.

Here’s our new graph:


It looks quite different (networkx makes a new organization each time you run it) but I swear it’s the same one. User 2 is quite the super user, don’tcha think?

Full code on GitHub.

NYL Gyft Express

In October New York Life held its first Hackathon. The premise: come up with an innovation to improve the customer experience at NYL. With my parter Dan from tax compliance, I used the 36 hours allotted to create a MVP to present.

The result? NYL Gyft Express, a simple webapp allowing agents to send gift to clients anywhere in the country in under an hour, as the home page shows:

Screen Shot 2016-02-03 at 12.18.47 AM.png

From here, all an agent has to do it login, pick their city, fill out a short form with the client’s name and address, and success! The gift will be on its way.

Screen Shot 2016-02-03 at 12.23.05 AM.png

The benefit of placing such orders through the webapp, is budgets for each agent can easily be tracked and all orders can conveniently be stored in a database for later analysis.

We NYL Gyft Express was a valuable tool for three reasons:

  1. Word of mouth advertising is the best form of advertising. And sending gifts via NYL Gyft Express is exactly the kind of thing to get people taking about NYL (not easy for a life insurance company).
  2. It’s important for our agents to feel empowered, and that they are supported with the latest tools and technology. NYL Gyft express isn’t just leveraging a technology that no other insurance company is using, but no other company period.
  3. Finally, we believe it is worth it for companies to go above-and-beyond when it comes to customer service. In this age of rapid information spread, a great customer service is the type of event that will spread and ultimately pay off for the company doing it

Overall, the NYL Hackathon was a fun, challenging, and great learning experience. Although I struggled a bit at first, eventually I got the hang of creating views in Flask and sending variables between them. Most of my API experience had been with Yahoo’s convoluted, arachaic API, so using Postmates was a relative breeze.

Currently, NYL is exploring options of implementing a similar type of service to the family’s of clients that are dealing with a loss of a loved one. A surprisingly low percentage of death claim money is reinvested back into the company, and the unpleasant claims process currently in place might have something to do with it.

Code on GitHub


A Look at Movie Revenues

This is an exercise for me in using R to make graphs and such. Some of it you may find interesting, but there’s no set goal to this article.

Daily movie revenues were taken from for the years 2002-2013. First thing I did was sum up the daily revenues for each movie to get it’s total gross. The top five are shown here:


Not the greatest output format, but it’ll do for now. If you’re curious why Avatar is only listed at having made $7,500,000, the answer is because this is domestic receipts only. It made only about an extra $2 billion overseas, so I think it’s ignore if we ignore foreign gross.

Here’s a quick line graph of when these movies made their money:


I tried to get a legend in this plot, but sizing it was an issue, so I simply scrapped the legend idea altogether. Once again the Y axis is mucked up, and I should probably limit the X axis. I could do these things, but instead, let’s move forward.

Let’s see if we can compute daily totals for each day. This will be done using the aggregate function in R, summing by date. Here’s an unpleasing line graph of all the daily totals:


To convert these daily totals into weekly totals would be quite the challenge. Luckily I have a code to do so, but it is both discouraging and safe to say without such help, I would not be able to have figure it out. Needless to say, with a little help from my friends, I now have weekly box office receipts as well. This was done with Friday as the first day of the week, to capture weekends uniformly. Here’s a less neurologically paralyzing line chart of weekly box office receipts:



If I was trying to be professional about this, first thing is the Y axis labels would have to change. R defaulting to scientific notation is rather silly. Is there a slight uptick in the data? I think there is ever so slightly, since I have not controlled for inflation, something I am well capable of, believe it or not. But I won’t use inflation-adjusted figures just yet.

First I want to average the weekly grosses across years to get an average weekly gross. And here that chart is!


Sigh, the end of the year is putridly done because of numbering the final days of the year, but this works. I think. We got those nice summer, Thanksgiving, and Christmas bumps, so it appears I have succeeded in something. Yea yea, the title is cut off, whatever. I ain’t publishing this in a scientific journal. This was an exercise for me, I just want to publish it and move onto more things.



Tout Wars Mixed Auction Recap


Lookin' good, Paul

Lookin’ good, Paul

This weekend marked my third time participating in Tout Wars Mixed auction. As with all leagues, the passage of each year increases my comfort with its unique settings and the individual quirks of the participants. So, last year’s auction was a little easier than the year before, and this year’s was easier than the last. Still, you never know exactly how an auction will play out, and what’s fresh in your mind right after an auction fades slightly over the course of the year leading to it all still being a slightly unnerving experience at the outset, no matter how many years of experience you have under your belt.

I came into the auction without a specific strategy in mind, but based on my dollar values I knew a couple things were likely 1) I would end up with a top tier hitter and 2) I wasn’t likely to get a top tier pitcher. This turned out to be true as I landed Paul Goldschmidt for $39 early in the draft and was outbid on all pitchers one would label “elite”. Anyhoo, here’s how the roster ended up shaking out:

Pos Player Price Value
C Jason Castro 12 16
C Derek Norris 5 7
1B Paul Goldschmidt 39 40
2B Jed Lowrie 6 9
SS Jean Segura 19 21
3B Evan Longoria 27 30
CI Josh Donaldson 20 25
MI Chris Owings 3 4
OF Austin Jackson 14 18
OF Shane Victorino 11 13
OF Christian Yelich 10 10
OF Michael Cuddyer 9 16
OF Nick Castellanos 3 3
UT Xander Bogaerts 9 9
P Shelby Miller 12 15
P Jeff Samardzija 7 14
P Rick Porcello 7 10
P Yordano Ventura 6 13
P Corey Kluber 6 9
P Alex Wood 5 8
P Archie Bradley 3 8
P Joe Nathan 15 15
P Ernesto Frieri 12 13

In the above table Price is the price I paid for the player and Value is what I had him listed for in my rankings. As you can see I didn’t have to go over my dollar value on any player, though I did get Christian Yelich (meh) for his $10 price, and Xander Bogaerts for $9. Both purchases occurred later in the draft when I had more money than I wanted, and was looking to buy every player right up to his listed amount. Yes, that does mean there are times when I don’t bid a player up to his listed amount, either because I don’t need a player at a certain position or because I see numerous players going for a few dollars below my listing (and I assume, sometimes wrongly, that trend will continue).

I got my two closers, and despite not grabbing a top name pitcher I’m extremely pleased with the depth of my staff (although who isn’t). My hitting is a little light on power but should be competitive and yadda yadda I like my team. Shocking.

One thing that stood out to me is the different approaches the different participants took to bidding. There were two basic strategies: 1) Either being in on players from the beginning or 2) Waiting for a “Going once, going twice” to finally jump in on the action. It’s always funny when two people utilize Strategy No. 2 on the same player and blurt out “Fourteen!” at the last second and then enter a bidding war for that player. Yea, we all know you wanted him all along.

There’s the advantage to always being in on most players in that you’re there when the bidding unexpectedly (in a good way) stops and you land a player for a good value. You may not think that one dollar matters, but I guarantee at least 100 times per auction a player is going once, going twice for say $9 and someone else is thinking in their heads, “Man, I would have liked to have landed him for nine, but I’m not bidding up to ten.” And that, my friends, is the difference between getting a player and not getting a player, which I don’t think requires an explanation of the impact it can have on your draft.

So there’s an upside to being in on players, and as with most upsides there’s also a downside… this downside being that there’s a chance you’ll wind up stuck with a player you’re not happy to be stuck with. Not dragging the flip-side of this out, Strategy No. 2, bidding only when you have to, won’t lead you to make as many regrettable purchasing decisions, but there’s a greater chance of FOMO (fear of missing out) on a player. You have to strike your own personal balance of how “in” on players you want to be, and most of that draws back to confidence in your self and your rankings. There’s nothing terribly wrong with taking either a aggressive or passive approach to the auction, so I’ll leave that decision up to you.

One last minor point about auctions in general that I want to bring up, and it’s regarding how you should increment bid, particularly on a player you really want. Say you’re a big Rays fan and are dead-set on landing Evan Longoria for your listed price of $30, though not a penny more. He’s nominated at $24, and you quickly jump in with a bid of $25. And so it continues to $26…$27…$28…you offer $29… going once… going twice…but right before you hear that glorious “Sold” the other person shouts “Thirty Dollars!”. “Goddamnit” you mutter to yourself and Longoria is sold to another owner. What happened here isn’t that someone truly outbid you for Longoria, they simply got the timing of their bid right, presuming both of you had him listed for $30.

This may seem like a rare enough occurrence to ignore—especially because you don’t usually ever know whether or not the other guy would have kept biding up even if you were the one in turn to bid $30—but even if it happens a few times draft, I would deem it worth paying attention to. The simple rule is try to bid on the evens for players you have worth an even dollar amount, which will ensure your best odds of landing that player, assuming you are willing to bid right up to your dollar amount.

Yes, this strategy is thwarted by simple non-incrememental bidding—non-incremental bidding being your tool to get “on track” to bid your max on a player. But once you know how stupid you feel when a player you want goes for exactly your dollar amount, you’ll start to pay attention to these things.

It should be another fun year, hopefully I come away with my first title, and at least I hope my team stays competitive throughout the season. I’ve already started doing my daily prayers and sacrifices to the Injury Gods, who have served me fairly well in the past. Thank you to Peter for running the league once again and all the participants who make it a great experience.

Play ball!