Chapter 3 Data transformation

To merge all the beer ratings data downloaded by the web scrapper into a single CSV file and brewery ratings data downloaded by the web scrapper into a single CSV, we used a python script CraftBeerRatingsAnalysis/brewer_beer_parser/data_merger.py available in the repository.

To run the data merger and save the merged csv files into data folder follow below steps.

$ cd CraftBeerRatingsAnalysis/brewery_beer_parser

$ python3 data_merger.py

** We added State and Brewery name to the beer dataset in order to make it easy for us to join brewery data to beer data in case if we needed that in our further analysis.

Merged data will be exported to data directory in the project.

Beer data: CraftBeerRatingsAnalysis/data/all_beers.csv

Brewery Data: CraftBeerRatingsAnalysis/data/all_breweries.csv

3.1 Beer Ratings Data:

Before Cleaning

We removed % ABV characters from the beer_abv column and IBU characters from the beer_ibu column, then we converted these two columns as numeric datatype. Next, we converted the beer_added column to date datatype and also added a year column that will be required by the downstream analysis. In addition, we standardized the beer style names because the same style could be referred to differently in the data e.g. IPA - New England / Hazy and IPA - New England both refer to the New England IPA. Finally, we cleaned up state name in cleaned_all_beers to remove _ ’s. After cleaning, we have 124,916 beers in total.

After Cleaning

3.2 Brewery Ratings Data:

We removed un-necessary columns in all_breweries dataset and renamed the name column to brewery_name. Furthermore, we removed breweries flagged as closed, proprietor, bar, and also breweries with less than 10 beers. After cleaning there are 5015 breweries in total.

After Cleaning