# A look at the adjusting closing price of stocks from January 1, 2015 to September 28, 2015 - a programming assignment

The code in this repo is the result of an interview exercise that was given to me. It involves stocks, some basic stats and a bit of web scraping.

View the Project on GitHub juandes/Stocks-StandardDeviation-Assignment

# A look at the adjusting closing price of stocks from January 1, 2015 to September 28, 2015

## Overview

The work presented in this report is a coding problem assignment that was giving to me during the hiring process for a data engineer position. Because the assignment was kind cool, I decided to make a report out of it.

The task of this assignment was to find the S&P stock with the highest adjusting close price standard deviation for the period January 1, 2015 until September 28, 2015.

### Tools used

• Python and Pandas
• R and rvest for scraping the stocks list

## The solution

As mentioned previously, the purpose of this assignment was to find the S&P stock with the highest close price standard deviation during a given period of time. So, the first part of the problem is to find the stocks list, since it was not given and it is kind of silly to write them in the code (there a bit over 500 stocks). To do this, I used R's rvest web scraping package.

### Scraping the stocks list

The data was scraped from the Wikipedia page List of S&P 500 companies. If you see the page, it has a table with the 505 common stocks. R's rvest package works by specifying the css selector that matches the data we want. To find the css selector I wanted, I used the SelectorGadget widget. If you use the gadget in the page I linked before, you will see that the selector tag containing the stock code is `tr:nth-child(i) td:nth-child(1)`, where `i` is the position of the stock (starting from 2) in the table.

This is the R script for scraping the stocks.

```library(rvest)
stocks <- data.frame(stock = character(), stringsAsFactors = FALSE)
for (i in 2:506) {
stock.symbol <- stocks.site %>%
html_node(paste0("tr:nth-child(", i, ") td:nth-child(1) .text")) %>%
html_text()
stocks[i - 1, 1] <-stock.symbol
}

write.table(stocks, file = 'stocks_list.txt', col.names = FALSE, row.names = FALSE,
quote = FALSE)```

Now that we have scraped the data, it's time for writing the solution.

### Finding the highest adjusting close price standard deviation

The first step is to load the required libraries. As I mentioned at the beginning, we'll be using the Pandas library.

```import datetime
import pandas as pd
import pandas.io.data```

Read the data and create an empty list that will have the stocks.

```# Load the stocks codes from an external file
stocks_file = open('stocks_list.txt', 'r')
stocks_list = []```

Specify the time period

```# Note: the stock market is closed on New Year's Day
start_date = datetime.datetime(2015, 1, 1)
end_date = datetime.datetime(2015, 9, 28)```

To keep track of the largest standard deviation and its stock, I used a dictionary that will be updated each time the program finds a standard deviation greater than the current one.

```# This dict has the current stock with the highest std
current_highest_adjclose = {'stock': 'placeholder', 'stdev': -1}```

Now that the structures have been created, it's time to add the stocks to `stocks_list`.

```# Add the stocks to a new list while removing \n
for line in stocks_file:
stocks_list.append(line.strip('\n'))```

The next piece of code is the main part of the script. It is a loop that iterates through the stocks list and do the following:

• It looks for the stock `stock` using `pd.io.data.get_data_yahoo` ,which returns a dataframe. From said dataframe, we are interested in the 'Adj Close' column.
• Then it calculates the standard deviation of the column.
• It checks if the current standard deviation is greater than the one in the dictionary. If it is greater, it updates the value with the new standard deviation and it also updates the name of the stock. If the standard deviation is smaller, we continue.
```for stock in stocks_list:
print stock
try:
s = pd.io.data.get_data_yahoo(stock, start=start_date,
current_standard_deviation = s.std()
if current_standard_deviation > current_highest_adjclose['stdev']:
```print 'Highest \'Adj Close\' is: %f (%s)' % (