2 min read

US flight cancellations in 2008

Sample Data

Updating column Month with month names and DayOfWeek with day names.

newflights <- flights

newflights$DayOfWeek[newflights$DayOfWeek == 1] = 'Monday'
newflights$DayOfWeek[newflights$DayOfWeek == 2] = 'Tuesday'
newflights$DayOfWeek[newflights$DayOfWeek == 3] = 'Wednesday'
newflights$DayOfWeek[newflights$DayOfWeek == 4] = 'Thursday'
newflights$DayOfWeek[newflights$DayOfWeek == 5] = 'Friday'
newflights$DayOfWeek[newflights$DayOfWeek == 6] = 'Saturday'
newflights$DayOfWeek[newflights$DayOfWeek == 7] = 'Sunday'

newflights$Month[newflights$Month == 1] = 'January'
newflights$Month[newflights$Month == 2] = 'February'
newflights$Month[newflights$Month == 3] = 'March'
newflights$Month[newflights$Month == 4] = 'April'
newflights$Month[newflights$Month == 5] = 'May'
newflights$Month[newflights$Month == 6] = 'June'
newflights$Month[newflights$Month == 7] = 'July'
newflights$Month[newflights$Month == 8] = 'August'
newflights$Month[newflights$Month == 9] = 'September'
newflights$Month[newflights$Month == 10] =  'October'
newflights$Month[newflights$Month == 11] =  'November'
newflights$Month[newflights$Month == 12] =  'December'

Creating a dataframe containing count of cancelled flights grouped by Months and Days of Week. Also, arranging the dataframe in chronological order.

cancelledByMonthDay <- newflights %>%
  filter(Cancelled == 1) %>%
  group_by(Month, DayOfWeek) %>%
  count()

cancelledByMonthDay$Month <- factor(cancelledByMonthDay$Month, 
                                        levels = month.name)
cancelledByMonthDay$DayOfWeek <- factor(cancelledByMonthDay$DayOfWeek, 
                                           levels = c("Sunday", 
                                                      "Monday", 
                                                      "Tuesday", 
                                                      "Wednesday", 
                                                      "Thursday", 
                                                      "Friday", 
                                                      "Saturday"))

colnames(cancelledByMonthDay)[3] <- "Count"

cancelledByMonthDay <- cancelledByMonthDay[order(cancelledByMonthDay$DayOfWeek),]
cancelledByMonthDay <- arrange(cancelledByMonthDay, Month)

cancelledByMonthDay %>%
  datatable()

Plotting grouped bar chart to display total annual cancellations separated by months and days of week. Furthermore, this plot is an intital sketch of the final plot that I intend to create with D3.js.

Reshaping dataframe for D3.js ingestion.

cancelledByMonthDay$Sunday <- cancelledByMonthDay$Count[cancelledByMonthDay$DayOfWeek == "Sunday"]
cancelledByMonthDay$Monday <- cancelledByMonthDay$Count[cancelledByMonthDay$DayOfWeek == "Monday"]
cancelledByMonthDay$Tuesday <- cancelledByMonthDay$Count[cancelledByMonthDay$DayOfWeek == "Tuesday"]
cancelledByMonthDay$Wednesday <- cancelledByMonthDay$Count[cancelledByMonthDay$DayOfWeek == "Wednesday"]
cancelledByMonthDay$Thursday <- cancelledByMonthDay$Count[cancelledByMonthDay$DayOfWeek == "Thursday"]
cancelledByMonthDay$Friday <- cancelledByMonthDay$Count[cancelledByMonthDay$DayOfWeek == "Friday"]
cancelledByMonthDay$Saturday <- cancelledByMonthDay$Count[cancelledByMonthDay$DayOfWeek == "Saturday"]
cancelledByMonthDay$Month <- unique(cancelledByMonthDay$Month)
cancelledByMonthDay$DayOfWeek <- NULL
cancelledByMonthDay$Count <- NULL
cancelled_df <- unique(cancelledByMonthDay)

Writing the reshaped dataframe to csv.

write_csv(cancelled_df, "total_cancellations_2008.csv")