Chapter 7 Manipulación de datos en R con dplyr

7.1 Introducción a dpyr y tbl (tibble)

La mayor parte de los conjuntos de datos tienen más información de la que contienen en su interior. dplyr te permite acceder a esa información, ya que introduce una gramática para la manipulación de datos.

Por ejemplo, incluye 5 funciones simples que puden ser usadas para crear nuevas variables, observaciones y nuevas formas de describir tus datos. Se pueden usar también para hacer subconjuntos de datos. Y además dplyr es muy rápido, ya que está escrito en C++, lo que significa que tenemos la velocidad de C pero con la facilidad de R.

En este curso vamos a aprender a manejar dplyr como un experto, así como a utilizar la estructura tbl y el operador pipe %>%.

También vamos a aprender a acceder a datos almacenados en bases de datos, que ofrece una forma sencilla de trabajar con datos que son muy grandes para encajar en R al mismo tiempo.

Vamos a cargar dplyr y hflights que contiene un conjunto de datos con información de los vuelos que despegan desde Houston.

# Load the dplyr package
library(dplyr)

# Load the hflights package
library(hflights)

# Call both head() and summary() on hflights
head(hflights)
##      Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier
## 5424 2011     1          1         6    1400    1500            AA
## 5425 2011     1          2         7    1401    1501            AA
## 5426 2011     1          3         1    1352    1502            AA
## 5427 2011     1          4         2    1403    1513            AA
## 5428 2011     1          5         3    1405    1507            AA
## 5429 2011     1          6         4    1359    1503            AA
##      FlightNum TailNum ActualElapsedTime AirTime ArrDelay DepDelay Origin
## 5424       428  N576AA                60      40      -10        0    IAH
## 5425       428  N557AA                60      45       -9        1    IAH
## 5426       428  N541AA                70      48       -8       -8    IAH
## 5427       428  N403AA                70      39        3        3    IAH
## 5428       428  N492AA                62      44       -3        5    IAH
## 5429       428  N262AA                64      45       -7       -1    IAH
##      Dest Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted
## 5424  DFW      224      7      13         0                         0
## 5425  DFW      224      6       9         0                         0
## 5426  DFW      224      5      17         0                         0
## 5427  DFW      224      9      22         0                         0
## 5428  DFW      224      9       9         0                         0
## 5429  DFW      224      6      13         0                         0
summary(hflights)
##       Year          Month          DayofMonth      DayOfWeek    
##  Min.   :2011   Min.   : 1.000   Min.   : 1.00   Min.   :1.000  
##  1st Qu.:2011   1st Qu.: 4.000   1st Qu.: 8.00   1st Qu.:2.000  
##  Median :2011   Median : 7.000   Median :16.00   Median :4.000  
##  Mean   :2011   Mean   : 6.514   Mean   :15.74   Mean   :3.948  
##  3rd Qu.:2011   3rd Qu.: 9.000   3rd Qu.:23.00   3rd Qu.:6.000  
##  Max.   :2011   Max.   :12.000   Max.   :31.00   Max.   :7.000  
##                                                                 
##     DepTime        ArrTime     UniqueCarrier        FlightNum   
##  Min.   :   1   Min.   :   1   Length:227496      Min.   :   1  
##  1st Qu.:1021   1st Qu.:1215   Class :character   1st Qu.: 855  
##  Median :1416   Median :1617   Mode  :character   Median :1696  
##  Mean   :1396   Mean   :1578                      Mean   :1962  
##  3rd Qu.:1801   3rd Qu.:1953                      3rd Qu.:2755  
##  Max.   :2400   Max.   :2400                      Max.   :7290  
##  NA's   :2905   NA's   :3066                                    
##    TailNum          ActualElapsedTime    AirTime         ArrDelay      
##  Length:227496      Min.   : 34.0     Min.   : 11.0   Min.   :-70.000  
##  Class :character   1st Qu.: 77.0     1st Qu.: 58.0   1st Qu.: -8.000  
##  Mode  :character   Median :128.0     Median :107.0   Median :  0.000  
##                     Mean   :129.3     Mean   :108.1   Mean   :  7.094  
##                     3rd Qu.:165.0     3rd Qu.:141.0   3rd Qu.: 11.000  
##                     Max.   :575.0     Max.   :549.0   Max.   :978.000  
##                     NA's   :3622      NA's   :3622    NA's   :3622     
##     DepDelay          Origin              Dest              Distance     
##  Min.   :-33.000   Length:227496      Length:227496      Min.   :  79.0  
##  1st Qu.: -3.000   Class :character   Class :character   1st Qu.: 376.0  
##  Median :  0.000   Mode  :character   Mode  :character   Median : 809.0  
##  Mean   :  9.445                                         Mean   : 787.8  
##  3rd Qu.:  9.000                                         3rd Qu.:1042.0  
##  Max.   :981.000                                         Max.   :3904.0  
##  NA's   :2905                                                            
##      TaxiIn           TaxiOut         Cancelled       CancellationCode  
##  Min.   :  1.000   Min.   :  1.00   Min.   :0.00000   Length:227496     
##  1st Qu.:  4.000   1st Qu.: 10.00   1st Qu.:0.00000   Class :character  
##  Median :  5.000   Median : 14.00   Median :0.00000   Mode  :character  
##  Mean   :  6.099   Mean   : 15.09   Mean   :0.01307                     
##  3rd Qu.:  7.000   3rd Qu.: 18.00   3rd Qu.:0.00000                     
##  Max.   :165.000   Max.   :163.00   Max.   :1.00000                     
##  NA's   :3066      NA's   :2947                                         
##     Diverted       
##  Min.   :0.000000  
##  1st Qu.:0.000000  
##  Median :0.000000  
##  Mean   :0.002853  
##  3rd Qu.:0.000000  
##  Max.   :1.000000  
## 
str(hflights)
## 'data.frame':    227496 obs. of  21 variables:
##  $ Year             : int  2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 ...
##  $ Month            : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ DayofMonth       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ DayOfWeek        : int  6 7 1 2 3 4 5 6 7 1 ...
##  $ DepTime          : int  1400 1401 1352 1403 1405 1359 1359 1355 1443 1443 ...
##  $ ArrTime          : int  1500 1501 1502 1513 1507 1503 1509 1454 1554 1553 ...
##  $ UniqueCarrier    : chr  "AA" "AA" "AA" "AA" ...
##  $ FlightNum        : int  428 428 428 428 428 428 428 428 428 428 ...
##  $ TailNum          : chr  "N576AA" "N557AA" "N541AA" "N403AA" ...
##  $ ActualElapsedTime: int  60 60 70 70 62 64 70 59 71 70 ...
##  $ AirTime          : int  40 45 48 39 44 45 43 40 41 45 ...
##  $ ArrDelay         : int  -10 -9 -8 3 -3 -7 -1 -16 44 43 ...
##  $ DepDelay         : int  0 1 -8 3 5 -1 -1 -5 43 43 ...
##  $ Origin           : chr  "IAH" "IAH" "IAH" "IAH" ...
##  $ Dest             : chr  "DFW" "DFW" "DFW" "DFW" ...
##  $ Distance         : int  224 224 224 224 224 224 224 224 224 224 ...
##  $ TaxiIn           : int  7 6 5 9 9 6 12 7 8 6 ...
##  $ TaxiOut          : int  13 9 17 22 9 13 15 12 22 19 ...
##  $ Cancelled        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ CancellationCode : chr  "" "" "" "" ...
##  $ Diverted         : int  0 0 0 0 0 0 0 0 0 0 ...

Este conjunto de datos muestra los registros de todos los vuelos que salieron de Houston (Texas) en 2011. Entre estos datos se encuentran pistas sobre qué aerolíneas son las más fiables. Esto se podría saber conociendo cuáles son las que tienen retrasos mayores o mayores cancelaciones. Como hemos podido ver, hay más de 227.000 vuelos. Visualizar estos datos en R haría que se colapsara.

dplyr nos puede ayudar a visualizar esta gran cantidad de datos, mediante el uso de tbl. Éste es un tipo especial de data frame para grandes conjuntos de datos, entre otros. Un data frame se transformaría usando la función tbl_df() o as_tibble().

hflights <- tbl_df(hflights)
class(hflights)
## [1] "tbl_df"     "tbl"        "data.frame"
hflights
## # A tibble: 227,496 x 21
##     Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier FlightNum
##    <int> <int>      <int>     <int>   <int>   <int> <chr>             <int>
##  1  2011     1          1         6    1400    1500 AA                  428
##  2  2011     1          2         7    1401    1501 AA                  428
##  3  2011     1          3         1    1352    1502 AA                  428
##  4  2011     1          4         2    1403    1513 AA                  428
##  5  2011     1          5         3    1405    1507 AA                  428
##  6  2011     1          6         4    1359    1503 AA                  428
##  7  2011     1          7         5    1359    1509 AA                  428
##  8  2011     1          8         6    1355    1454 AA                  428
##  9  2011     1          9         7    1443    1554 AA                  428
## 10  2011     1         10         1    1443    1553 AA                  428
## # … with 227,486 more rows, and 13 more variables: TailNum <chr>,
## #   ActualElapsedTime <int>, AirTime <int>, ArrDelay <int>,
## #   DepDelay <int>, Origin <chr>, Dest <chr>, Distance <int>,
## #   TaxiIn <int>, TaxiOut <int>, Cancelled <int>, CancellationCode <chr>,
## #   Diverted <int>

Esta función hace que solo se vean las columnas y filas que se ajusten al espacio y al tamaño de la pantalla. También te muestra cuántas filas y columnas faltan por ver y el tipo de datos que contienen.

También podemos usar glimpse(), que muestra los tipos de datos y los valores iniciales de cada columna.

glimpse(hflights)
## Observations: 227,496
## Variables: 21
## $ Year              <int> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011…
## $ Month             <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ DayofMonth        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…
## $ DayOfWeek         <int> 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7…
## $ DepTime           <int> 1400, 1401, 1352, 1403, 1405, 1359, 1359, 1355…
## $ ArrTime           <int> 1500, 1501, 1502, 1513, 1507, 1503, 1509, 1454…
## $ UniqueCarrier     <chr> "AA", "AA", "AA", "AA", "AA", "AA", "AA", "AA"…
## $ FlightNum         <int> 428, 428, 428, 428, 428, 428, 428, 428, 428, 4…
## $ TailNum           <chr> "N576AA", "N557AA", "N541AA", "N403AA", "N492A…
## $ ActualElapsedTime <int> 60, 60, 70, 70, 62, 64, 70, 59, 71, 70, 70, 56…
## $ AirTime           <int> 40, 45, 48, 39, 44, 45, 43, 40, 41, 45, 42, 41…
## $ ArrDelay          <int> -10, -9, -8, 3, -3, -7, -1, -16, 44, 43, 29, 5…
## $ DepDelay          <int> 0, 1, -8, 3, 5, -1, -1, -5, 43, 43, 29, 19, -2…
## $ Origin            <chr> "IAH", "IAH", "IAH", "IAH", "IAH", "IAH", "IAH…
## $ Dest              <chr> "DFW", "DFW", "DFW", "DFW", "DFW", "DFW", "DFW…
## $ Distance          <int> 224, 224, 224, 224, 224, 224, 224, 224, 224, 2…
## $ TaxiIn            <int> 7, 6, 5, 9, 9, 6, 12, 7, 8, 6, 8, 4, 6, 5, 6, …
## $ TaxiOut           <int> 13, 9, 17, 22, 9, 13, 15, 12, 22, 19, 20, 11, …
## $ Cancelled         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ CancellationCode  <chr> "", "", "", "", "", "", "", "", "", "", "", ""…
## $ Diverted          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…

Si queremos que hflights sea un data.frame otra vez, usamos as.data.frame().

hflights <- as.data.frame(hflights)
class(hflights)
## [1] "data.frame"
hflights <- as_tibble(hflights)
class(hflights)
## [1] "tbl_df"     "tbl"        "data.frame"

Como el nombre de las compañías viene codificado en la columna UniqueCarrier, vamos a generar una nueva columna con sus nombre exactos (Carrier).

# Estas son las aerolíneas codificadas
carrier <- hflights$UniqueCarrier

# Vamos a hacer la conversión de esos códigos a su nombre real
lut <- c("AA" = "American", "AS" = "Alaska", "B6" = "JetBlue", "CO" = "Continental", 
         "DL" = "Delta", "OO" = "SkyWest", "UA" = "United", "US" = "US_Airways", 
         "WN" = "Southwest", "EV" = "Atlantic_Southeast", "F9" = "Frontier", 
         "FL" = "AirTran", "MQ" = "American_Eagle", "XE" = "ExpressJet", "YV" = "Mesa")

# Add the Carrier column to hflights
hflights$Carrier <- lut[carrier]

# Glimpse at hflights
glimpse(hflights)
## Observations: 227,496
## Variables: 22
## $ Year              <int> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011…
## $ Month             <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ DayofMonth        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…
## $ DayOfWeek         <int> 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7…
## $ DepTime           <int> 1400, 1401, 1352, 1403, 1405, 1359, 1359, 1355…
## $ ArrTime           <int> 1500, 1501, 1502, 1513, 1507, 1503, 1509, 1454…
## $ UniqueCarrier     <chr> "AA", "AA", "AA", "AA", "AA", "AA", "AA", "AA"…
## $ FlightNum         <int> 428, 428, 428, 428, 428, 428, 428, 428, 428, 4…
## $ TailNum           <chr> "N576AA", "N557AA", "N541AA", "N403AA", "N492A…
## $ ActualElapsedTime <int> 60, 60, 70, 70, 62, 64, 70, 59, 71, 70, 70, 56…
## $ AirTime           <int> 40, 45, 48, 39, 44, 45, 43, 40, 41, 45, 42, 41…
## $ ArrDelay          <int> -10, -9, -8, 3, -3, -7, -1, -16, 44, 43, 29, 5…
## $ DepDelay          <int> 0, 1, -8, 3, 5, -1, -1, -5, 43, 43, 29, 19, -2…
## $ Origin            <chr> "IAH", "IAH", "IAH", "IAH", "IAH", "IAH", "IAH…
## $ Dest              <chr> "DFW", "DFW", "DFW", "DFW", "DFW", "DFW", "DFW…
## $ Distance          <int> 224, 224, 224, 224, 224, 224, 224, 224, 224, 2…
## $ TaxiIn            <int> 7, 6, 5, 9, 9, 6, 12, 7, 8, 6, 8, 4, 6, 5, 6, …
## $ TaxiOut           <int> 13, 9, 17, 22, 9, 13, 15, 12, 22, 19, 20, 11, …
## $ Cancelled         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ CancellationCode  <chr> "", "", "", "", "", "", "", "", "", "", "", ""…
## $ Diverted          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ Carrier           <chr> "American", "American", "American", "American"…

Algo similar ocurre con la columna CancellationCode, que muestra las razones por las que un vuelo fue cancelado.

# The hflights tbl you built in the previous exercise is available in the workspace.

# The lookup table
lut <- c("A" = "carrier", "B" = "weather", "C" = "FFA", "D" = "security", "E" = "not cancelled")

# Add the Code column
hflights$Code <- lut[hflights$CancellationCode]

# Glimpse at hflights
glimpse(hflights)
## Observations: 227,496
## Variables: 23
## $ Year              <int> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011…
## $ Month             <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ DayofMonth        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…
## $ DayOfWeek         <int> 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7…
## $ DepTime           <int> 1400, 1401, 1352, 1403, 1405, 1359, 1359, 1355…
## $ ArrTime           <int> 1500, 1501, 1502, 1513, 1507, 1503, 1509, 1454…
## $ UniqueCarrier     <chr> "AA", "AA", "AA", "AA", "AA", "AA", "AA", "AA"…
## $ FlightNum         <int> 428, 428, 428, 428, 428, 428, 428, 428, 428, 4…
## $ TailNum           <chr> "N576AA", "N557AA", "N541AA", "N403AA", "N492A…
## $ ActualElapsedTime <int> 60, 60, 70, 70, 62, 64, 70, 59, 71, 70, 70, 56…
## $ AirTime           <int> 40, 45, 48, 39, 44, 45, 43, 40, 41, 45, 42, 41…
## $ ArrDelay          <int> -10, -9, -8, 3, -3, -7, -1, -16, 44, 43, 29, 5…
## $ DepDelay          <int> 0, 1, -8, 3, 5, -1, -1, -5, 43, 43, 29, 19, -2…
## $ Origin            <chr> "IAH", "IAH", "IAH", "IAH", "IAH", "IAH", "IAH…
## $ Dest              <chr> "DFW", "DFW", "DFW", "DFW", "DFW", "DFW", "DFW…
## $ Distance          <int> 224, 224, 224, 224, 224, 224, 224, 224, 224, 2…
## $ TaxiIn            <int> 7, 6, 5, 9, 9, 6, 12, 7, 8, 6, 8, 4, 6, 5, 6, …
## $ TaxiOut           <int> 13, 9, 17, 22, 9, 13, 15, 12, 22, 19, 20, 11, …
## $ Cancelled         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ CancellationCode  <chr> "", "", "", "", "", "", "", "", "", "", "", ""…
## $ Diverted          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ Carrier           <chr> "American", "American", "American", "American"…
## $ Code              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…

7.2 Los cinco verbos básicos de dplyr

dplyr hace más que ofrecer una nueva estructura de datos. También ofrece una gramática completa para la manipulación de datos.

Los cinco verbos para tener la gramática de manipulación de datos son:

  • select(), que elimina columnas de un conjunto de datos.
  • filter(), que eliminar filas de un conjunto de datos.
  • arrange(), que reordena las filas de un conjunto de datos.
  • mutate(), que usa los datos para construir nuevas variables.
  • summarize(), que saca un resumen de los datos.

Además, pueden combinarse entre sí. select() y mutate() trabajan con variables (columnas); filter() y arrange() con observaciones (filas); y summarize() con grupos.

Existen otros verbos, como rename(), que funciona como select(), y transmute(), que funciona como mutate().

7.2.1 select()

Vamos a aprender a seleccionar unas variables de hflights con la función select().

select(dataframe, variables que queremos sin comillas y separadas por comas)
# Por ejemplo
select(df, 1:4, -2)

En este último caso también podemos recurrir al negativo para indicar cuál queremos excluir. Seleccionamos las columnas 1 al 4 pero le decimos que la 2 no la queremos.

# Print out a tbl with the four columns of hflights related to delay
select(hflights, ActualElapsedTime, AirTime, ArrDelay, DepDelay)
## # A tibble: 227,496 x 4
##    ActualElapsedTime AirTime ArrDelay DepDelay
##                <int>   <int>    <int>    <int>
##  1                60      40      -10        0
##  2                60      45       -9        1
##  3                70      48       -8       -8
##  4                70      39        3        3
##  5                62      44       -3        5
##  6                64      45       -7       -1
##  7                70      43       -1       -1
##  8                59      40      -16       -5
##  9                71      41       44       43
## 10                70      45       43       43
## # … with 227,486 more rows
# Print out the columns Origin up to Cancelled of hflights
select(hflights, 14:19)
## # A tibble: 227,496 x 6
##    Origin Dest  Distance TaxiIn TaxiOut Cancelled
##    <chr>  <chr>    <int>  <int>   <int>     <int>
##  1 IAH    DFW        224      7      13         0
##  2 IAH    DFW        224      6       9         0
##  3 IAH    DFW        224      5      17         0
##  4 IAH    DFW        224      9      22         0
##  5 IAH    DFW        224      9       9         0
##  6 IAH    DFW        224      6      13         0
##  7 IAH    DFW        224     12      15         0
##  8 IAH    DFW        224      7      12         0
##  9 IAH    DFW        224      8      22         0
## 10 IAH    DFW        224      6      19         0
## # … with 227,486 more rows
# Answer to last question: be concise! Find the most concise way to select: columns Year up to and including DayOfWeek, columns ArrDelay up to and including Diverted. You can examine the order of the variables in hflights with names(hflights) in the console.
select(hflights, 1:4, 12:21)
## # A tibble: 227,496 x 14
##     Year Month DayofMonth DayOfWeek ArrDelay DepDelay Origin Dest  Distance
##    <int> <int>      <int>     <int>    <int>    <int> <chr>  <chr>    <int>
##  1  2011     1          1         6      -10        0 IAH    DFW        224
##  2  2011     1          2         7       -9        1 IAH    DFW        224
##  3  2011     1          3         1       -8       -8 IAH    DFW        224
##  4  2011     1          4         2        3        3 IAH    DFW        224
##  5  2011     1          5         3       -3        5 IAH    DFW        224
##  6  2011     1          6         4       -7       -1 IAH    DFW        224
##  7  2011     1          7         5       -1       -1 IAH    DFW        224
##  8  2011     1          8         6      -16       -5 IAH    DFW        224
##  9  2011     1          9         7       44       43 IAH    DFW        224
## 10  2011     1         10         1       43       43 IAH    DFW        224
## # … with 227,486 more rows, and 5 more variables: TaxiIn <int>,
## #   TaxiOut <int>, Cancelled <int>, CancellationCode <chr>, Diverted <int>

El paquete dplyr viene también con unas funciones muy útiles para combinar con select(). Estas funciones nos ayudan a seleccionar variables según cómo sean el nombre de las mismas:

  • starts_with("X"): every name that starts with “X”,
  • ends_with("X"): every name that ends with “X”,
  • contains("X"): every name that contains “X”,
  • matches("X"): every name that matches “X”, where “X” can be a regular expression,
  • num_range("x", 1:5): the variables named x01, x02, x03, x04 and x05,
  • one_of(x): every name that appears in x, which should be a character vector.
# Print out a tbl containing just ArrDelay and DepDelay
select(hflights, ArrDelay, DepDelay)
## # A tibble: 227,496 x 2
##    ArrDelay DepDelay
##       <int>    <int>
##  1      -10        0
##  2       -9        1
##  3       -8       -8
##  4        3        3
##  5       -3        5
##  6       -7       -1
##  7       -1       -1
##  8      -16       -5
##  9       44       43
## 10       43       43
## # … with 227,486 more rows
# Print out a tbl as described in the second instruction, using both helper functions and variable names. Use a combination of helper functions and variable names to print out only the UniqueCarrier, FlightNum, TailNum, Cancelled, and CancellationCode columns of hflights.
select(hflights, UniqueCarrier, ends_with("Num"), starts_with("Cancell"))
## # A tibble: 227,496 x 5
##    UniqueCarrier FlightNum TailNum Cancelled CancellationCode
##    <chr>             <int> <chr>       <int> <chr>           
##  1 AA                  428 N576AA          0 ""              
##  2 AA                  428 N557AA          0 ""              
##  3 AA                  428 N541AA          0 ""              
##  4 AA                  428 N403AA          0 ""              
##  5 AA                  428 N492AA          0 ""              
##  6 AA                  428 N262AA          0 ""              
##  7 AA                  428 N493AA          0 ""              
##  8 AA                  428 N477AA          0 ""              
##  9 AA                  428 N476AA          0 ""              
## 10 AA                  428 N504AA          0 ""              
## # … with 227,486 more rows
# Print out a tbl as described in the third instruction, using only helper functions. Find the most concise way to return the following columns with select and its helper functions: DepTime, ArrTime, ActualElapsedTime, AirTime, ArrDelay, DepDelay. Use only helper functions!
select(hflights, ends_with("Time"), ends_with("Delay"))
## # A tibble: 227,496 x 6
##    DepTime ArrTime ActualElapsedTime AirTime ArrDelay DepDelay
##      <int>   <int>             <int>   <int>    <int>    <int>
##  1    1400    1500                60      40      -10        0
##  2    1401    1501                60      45       -9        1
##  3    1352    1502                70      48       -8       -8
##  4    1403    1513                70      39        3        3
##  5    1405    1507                62      44       -3        5
##  6    1359    1503                64      45       -7       -1
##  7    1359    1509                70      43       -1       -1
##  8    1355    1454                59      40      -16       -5
##  9    1443    1554                71      41       44       43
## 10    1443    1553                70      45       43       43
## # … with 227,486 more rows

Si comparamos entre la sección base de R con dplyr, vemos claramente su ventaja.

# Finish select call so that ex1d matches ex1r
ex1r <- hflights[c("TaxiIn", "TaxiOut", "Distance")]
ex1d <- select(hflights, starts_with("Taxi"), Distance)

# Finish select call so that ex2d matches ex2r
ex2r <- hflights[c("Year", "Month", "DayOfWeek", "DepTime", "ArrTime")]
ex2d <- select(hflights, 1:6, -3)

# Finish select call so that ex3d matches ex3r
ex3r <- hflights[c("TailNum", "TaxiIn", "TaxiOut")]
ex3d <- select(hflights, starts_with("T"))

7.2.2 mutate()

mutate() hace lo contrario que select(): crea nuevas variables. La estructura es similar a la anterior, donde indicamos al principio el nombre de nuestro conjunto de datos, seguido de l nombre de la nueva columna y una expresión regular.

mutate(df, z = x + y)
# hflights and dplyr are loaded and ready to serve you.

# Add the new variable ActualGroundTime to a copy of hflights and save the result as g1. Create a new data frame, g1, which is the data frame hflights with an additional column: ActualGroundTime, the difference between ActualElapsedTime and AirTime.
g1 <- mutate(hflights, ActualGroundTime = ActualElapsedTime - AirTime)

# Add the new variable GroundTime to g1. Save the result as g2. Extend g1 further, by adding an additional column GroundTime. This column is the sum of the TaxiIn and TaxiOut columns. Store the resulting data frame in g2. Check in the console that the GroundTime and ActualGroundTime columns are equal.
g2 <- mutate(g1, GroundTime = TaxiIn + TaxiOut)

# Add the new variable AverageSpeed to g2. Save the result as g3. Add a new variable to g2 named AverageSpeed that denotes the average speed that each plane flew in miles per hour. Save the resulting dataset as g3. Use the following formula: 60 * Distance / AirTime.
g3 <- mutate(g2, AverageSpeed = 60 * Distance / AirTime)

# Print out g3
g3
## # A tibble: 227,496 x 26
##     Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier FlightNum
##    <int> <int>      <int>     <int>   <int>   <int> <chr>             <int>
##  1  2011     1          1         6    1400    1500 AA                  428
##  2  2011     1          2         7    1401    1501 AA                  428
##  3  2011     1          3         1    1352    1502 AA                  428
##  4  2011     1          4         2    1403    1513 AA                  428
##  5  2011     1          5         3    1405    1507 AA                  428
##  6  2011     1          6         4    1359    1503 AA                  428
##  7  2011     1          7         5    1359    1509 AA                  428
##  8  2011     1          8         6    1355    1454 AA                  428
##  9  2011     1          9         7    1443    1554 AA                  428
## 10  2011     1         10         1    1443    1553 AA                  428
## # … with 227,486 more rows, and 18 more variables: TailNum <chr>,
## #   ActualElapsedTime <int>, AirTime <int>, ArrDelay <int>,
## #   DepDelay <int>, Origin <chr>, Dest <chr>, Distance <int>,
## #   TaxiIn <int>, TaxiOut <int>, Cancelled <int>, CancellationCode <chr>,
## #   Diverted <int>, Carrier <chr>, Code <chr>, ActualGroundTime <int>,
## #   GroundTime <int>, AverageSpeed <dbl>

Hemos calculado una variable nueva a cada paso. Pero lo podemos hacer de una sentada si las combinamos con comas.

# hflights and dplyr are ready, are you?

# Add a second variable loss_ratio to the dataset: m1
m1 <- mutate(hflights, loss = ArrDelay - DepDelay, 
             loss_ratio = loss / DepDelay)

# Add the three variables as described in the third instruction: m2
m2 <- mutate(hflights, TotalTaxi = TaxiIn + TaxiOut,
             ActualGroundTime = ActualElapsedTime - AirTime,
             Diff = TotalTaxi - ActualGroundTime)

7.2.3 filter()

La función filter() nos permite filtrar los datos según unas condiciones. Estas condiciones pueden estar definidas por operadores lógicos.

  • x < y, TRUE if x is less than y
  • x <= y, TRUE if x is less than or equal to y
  • x == y, TRUE if x equals y
  • x != y, TRUE if x does not equal y
  • x >= y, TRUE if x is greater than or equal to y
  • x > y, TRUE if x is greater than y
  • x %in% c(a, b, c), TRUE if x is in the vector c(a, b, c)

Los argumentos principales son elnombre del conjunto de datos y el nombre de las variables con la condición.

library(hflights)
data("hflights")

# All flights that traveled 3000 miles or more
filter(hflights, Distance >= 3000)
##     Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier
## 1   2011     1         31         1     924    1413            CO
## 2   2011     1         30         7     925    1410            CO
## 3   2011     1         29         6    1045    1445            CO
## 4   2011     1         28         5    1516    1916            CO
## 5   2011     1         27         4     950    1344            CO
## 6   2011     1         26         3     944    1350            CO
## 7   2011     1         25         2     924    1337            CO
## 8   2011     1         24         1    1144    1605            CO
## 9   2011     1         23         7     926    1335            CO
## 10  2011     1         22         6     942    1340            CO
## 11  2011     1         21         5     928    1334            CO
## 12  2011     1         20         4     938    1343            CO
## 13  2011     1         19         3     926    1341            CO
## 14  2011     1         18         2     927    1546            CO
## 15  2011     1         17         1     924    1349            CO
## 16  2011     1         16         7     922    1343            CO
## 17  2011     1         15         6     945    1355            CO
## 18  2011     1         14         5    1117    1506            CO
## 19  2011     1         13         4     929    1348            CO
## 20  2011     1         12         3     937    1358            CO
## 21  2011     1         11         2     926    1425            CO
## 22  2011     1         10         1     940    1413            CO
## 23  2011     1          9         7     956    1417            CO
## 24  2011     1          8         6     927    1403            CO
## 25  2011     1          8         6    1156    1618            CO
## 26  2011     1          7         5     930    1355            CO
## 27  2011     1          7         5    1204    1619            CO
## 28  2011     1          6         4     932    1402            CO
## 29  2011     1          6         4    1145    1606            CO
## 30  2011     1          5         3     929    1406            CO
## 31  2011     1          5         3      NA      NA            CO
## 32  2011     1          4         2     941    1357            CO
## 33  2011     1          4         2    1144    1619            CO
## 34  2011     1          3         1     931    1354            CO
## 35  2011     1          3         1    1210    1620            CO
## 36  2011     1          2         7     939    1414            CO
## 37  2011     1          2         7    1151    1625            CO
## 38  2011     1          1         6     942    1356            CO
## 39  2011     1          1         6    1145    1612            CO
## 40  2011     1          1         6    1447    1925            CO
## 41  2011     2         28         1    1040    1517            CO
## 42  2011     2         27         7     932    1403            CO
## 43  2011     2         26         6     928    1409            CO
## 44  2011     2         25         5     937    1408            CO
## 45  2011     2         24         4     930    1358            CO
## 46  2011     2         23         3     927    1355            CO
## 47  2011     2         22         2     939    1405            CO
##     FlightNum TailNum ActualElapsedTime AirTime ArrDelay DepDelay Origin
## 1           1  N69063               529     492       23       -1    IAH
## 2           1  N76064               525     493       20        0    IAH
## 3           1  N69063               480     459       55       80    IAH
## 4           1  N77066               480     463      326      351    IAH
## 5           1  N76055               474     455       -6       25    IAH
## 6           1  N76065               486     471        0       19    IAH
## 7           1  N68061               493     473      -13       -1    IAH
## 8           1  N76064               501     464      135      139    IAH
## 9           1  N76065               489     466      -15        1    IAH
## 10          1  N69063               478     465      -10       17    IAH
## 11          1  N76065               486     461      -16        3    IAH
## 12          1  N77066               485     469       -7       13    IAH
## 13          1  N76064               495     475       -9        1    IAH
## 14          1  N66057                NA      NA       NA        2    IAH
## 15          1  N76065               505     478       -1       -1    IAH
## 16          1  N69063               501     481       -7       -3    IAH
## 17          1  N77066               490     472        5       20    IAH
## 18          1  N67058               469     452       76      112    IAH
## 19          1  N76065               499     478       -2        4    IAH
## 20          1  N77066               501     482        8       12    IAH
## 21          1  N67052               539     497       35        1    IAH
## 22          1  N76064               513     486       23       15    IAH
## 23          1  N69063               501     484       27       31    IAH
## 24          1  N76065               516     489       13        2    IAH
## 25         73  N69063               502     487        8       11    IAH
## 26          1  N76064               505     475        5        5    IAH
## 27         73  N76065               495     471        9       19    IAH
## 28          1  N69063               510     488       12        7    IAH
## 29         73  N76064               501     481       -4        0    IAH
## 30          1  N77066               517     493       16        4    IAH
## 31         73                        NA      NA       NA       NA    IAH
## 32          1  N66051               496     472        7       16    IAH
## 33         73  N69063               515     494        9       -1    IAH
## 34          1  N76065               503     481        4        6    IAH
## 35         73  N77066               490     476       10       25    IAH
## 36          1  N76064               515     496       24       14    IAH
## 37         73  N76065               514     492       15        6    IAH
## 38          1  N69063               494     466        6       17    IAH
## 39         73  N76065               507     486        2        0    IAH
## 40         77  N76062               518     488       35       22    IAH
## 41          1  N77066               517     499       87       75    IAH
## 42          1  N76065               511     488       13        7    IAH
## 43          1  N76064               521     475       19        3    IAH
## 44          1  N77066               511     490       18       12    IAH
## 45          1  N76065               508     487        8        5    IAH
## 46          1  N77066               508     484        5        2    IAH
## 47          1  N66051               506     491       15       14    IAH
##     Dest Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted
## 1    HNL     3904      6      31         0                         0
## 2    HNL     3904     13      19         0                         0
## 3    HNL     3904      4      17         0                         0
## 4    HNL     3904      7      10         0                         0
## 5    HNL     3904      4      15         0                         0
## 6    HNL     3904      5      10         0                         0
## 7    HNL     3904      5      15         0                         0
## 8    HNL     3904      7      30         0                         0
## 9    HNL     3904      6      17         0                         0
## 10   HNL     3904      3      10         0                         0
## 11   HNL     3904      6      19         0                         0
## 12   HNL     3904      4      12         0                         0
## 13   HNL     3904      6      14         0                         0
## 14   HNL     3904      4      17         0                         1
## 15   HNL     3904      5      22         0                         0
## 16   HNL     3904      4      16         0                         0
## 17   HNL     3904      5      13         0                         0
## 18   HNL     3904      5      12         0                         0
## 19   HNL     3904      4      17         0                         0
## 20   HNL     3904      6      13         0                         0
## 21   HNL     3904      4      38         0                         0
## 22   HNL     3904     11      16         0                         0
## 23   HNL     3904      5      12         0                         0
## 24   HNL     3904     14      13         0                         0
## 25   HNL     3904      4      11         0                         0
## 26   HNL     3904      7      23         0                         0
## 27   HNL     3904      6      18         0                         0
## 28   HNL     3904      5      17         0                         0
## 29   HNL     3904      4      16         0                         0
## 30   HNL     3904      5      19         0                         0
## 31   HNL     3904     NA      NA         1                A        0
## 32   HNL     3904      5      19         0                         0
## 33   HNL     3904      6      15         0                         0
## 34   HNL     3904      6      16         0                         0
## 35   HNL     3904      5       9         0                         0
## 36   HNL     3904      5      14         0                         0
## 37   HNL     3904      4      18         0                         0
## 38   HNL     3904      5      23         0                         0
## 39   HNL     3904      4      17         0                         0
## 40   HNL     3904      8      22         0                         0
## 41   HNL     3904      3      15         0                         0
## 42   HNL     3904      4      19         0                         0
## 43   HNL     3904      4      42         0                         0
## 44   HNL     3904      4      17         0                         0
## 45   HNL     3904      4      17         0                         0
## 46   HNL     3904      4      20         0                         0
## 47   HNL     3904      3      12         0                         0
##  [ reached getOption("max.print") -- omitted 480 rows ]
# All flights flown by one of JetBlue, Southwest, or Delta
filter(hflights, UniqueCarrier == "JetBlue" | UniqueCarrier == "Southwest" | UniqueCarrier == "Delta")
##  [1] Year              Month             DayofMonth       
##  [4] DayOfWeek         DepTime           ArrTime          
##  [7] UniqueCarrier     FlightNum         TailNum          
## [10] ActualElapsedTime AirTime           ArrDelay         
## [13] DepDelay          Origin            Dest             
## [16] Distance          TaxiIn            TaxiOut          
## [19] Cancelled         CancellationCode  Diverted         
## <0 rows> (or 0-length row.names)
# All flights where taxiing took longer than flying. Extract from hflights all flights where taxiing took longer than the actual flight. Avoid the use of mutate() and do the math directly in the logical expression of filter().
filter(hflights, TaxiIn + TaxiOut > AirTime)
##      Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier
## 1    2011     1         24         1     731     904            AA
## 2    2011     1         30         7    1959    2132            AA
## 3    2011     1         24         1    1621    1749            AA
## 4    2011     1         10         1     941    1113            AA
## 5    2011     1         31         1    1301    1356            CO
## 6    2011     1         31         1    2113    2215            CO
## 7    2011     1         31         1    1434    1539            CO
## 8    2011     1         31         1     900    1006            CO
## 9    2011     1         30         7    1304    1408            CO
## 10   2011     1         30         7    2004    2128            CO
## 11   2011     1         30         7    1912    2032            CO
## 12   2011     1         30         7    1141    1251            CO
## 13   2011     1         30         7    1922    2032            CO
## 14   2011     1         30         7    1813    1914            CO
## 15   2011     1         30         7    1038    1214            CO
## 16   2011     1         26         3    1533    1629            CO
## 17   2011     1         26         3    1910    2012            CO
## 18   2011     1         25         2    1259    1401            CO
## 19   2011     1         25         2    2101    2158            CO
## 20   2011     1         25         2    1427    1530            CO
## 21   2011     1         24         1    1322    1428            CO
## 22   2011     1         24         1    1153    1253            CO
## 23   2011     1         24         1    1939    2059            CO
## 24   2011     1         24         1    1429    1531            CO
## 25   2011     1         24         1     930    1044            CO
## 26   2011     1         23         7    1301    1409            CO
## 27   2011     1         23         7    1551    1725            CO
## 28   2011     1         22         6     854     958            CO
## 29   2011     1         21         5    1301    1402            CO
## 30   2011     1         20         4    1835    2007            CO
## 31   2011     1         20         4    1600    1736            CO
## 32   2011     1         20         4    1304    1421            CO
## 33   2011     1         20         4    1928    2109            CO
## 34   2011     1         20         4    1934    2101            CO
## 35   2011     1         20         4    1434    1607            CO
## 36   2011     1         20         4    1158    1304            CO
## 37   2011     1         20         4    1509    1631            CO
## 38   2011     1         20         4    1911    2024            CO
## 39   2011     1         20         4    1429    1538            CO
## 40   2011     1         20         4    1417    1623            CO
## 41   2011     1         19         3    1906    2003            CO
## 42   2011     1         18         2    1544    1646            CO
## 43   2011     1         18         2    1430    1537            CO
## 44   2011     1         18         2     900    1015            CO
## 45   2011     1         18         2     930    1055            CO
## 46   2011     1         17         1    1309    1412            CO
## 47   2011     1         17         1    1432    1539            CO
##      FlightNum TailNum ActualElapsedTime AirTime ArrDelay DepDelay Origin
## 1          460  N545AA                93      42       29       11    IAH
## 2          533  N455AA                93      43       12       -6    IAH
## 3         1121  N484AA                88      43        4       -9    IAH
## 4         1436  N591AA                92      45       48       31    IAH
## 5          241  N14629                55      27       -2       -4    IAH
## 6         1533  N72405                62      30       20       13    IAH
## 7         1541  N16646                65      30       15        4    IAH
## 8         1583  N36207                66      32       10        0    IAH
## 9          241  N14645                64      31       10       -1    IAH
## 10         423  N16632                84      40       54       39    IAH
## 11         479  N73276                80      37       16        2    IAH
## 12         741  N76523                70      30       15       -4    IAH
## 13        1411  N38268                70      30       30       17    IAH
## 14        1558  N77295                61      30       23       18    IAH
## 15        1823  N14653                96      44       35        3    IAH
## 16          35  N19638                56      27        1       -2    IAH
## 17        1411  N33266                62      29       10        5    IAH
## 18         241  N32626                62      29        3       -6    IAH
## 19        1533  N32404                57      28        3        1    IAH
## 20        1541  N16617                63      29        6       -3    IAH
## 21         241  N24633                66      28       30       17    IAH
## 22         741  N36207                60      29       12        8    IAH
## 23        1411  N73275                80      28       57       34    IAH
## 24        1541  N32626                62      28        7       -1    IAH
## 25        1583  N53441                74      28       48       30    IAH
## 26         241  N46625                68      30       11       -4    IAH
## 27        1787  N59630                94      42       18       -4    IAH
## 28        1583  N14242                64      29        7       -6    IAH
## 29         241  N14639                61      28        4       -4    IAH
## 30           5  N24212                92      44       33       10    IAH
## 31           6  N78285                96      38       41        5    IAH
## 32         241  N27610                77      29       23       -1    IAH
## 33         423  N35271               101      44       35        3    IAH
## 34         479  N76504                87      35       45       24    IAH
## 35         623  N75851                93      44       30        4    IAH
## 36         741  N26215                66      30       23       13    IAH
## 37        1079  N73259                82      37       44       24    IAH
## 38        1411  N76516                73      31       22        6    IAH
## 39        1541  N17627                69      31       14       -1    IAH
## 40        1621  N17620               126      62       43       -3    IAH
## 41        1411  N33264                57      28        1        1    IAH
## 42          35  N38257                62      30       18        9    IAH
## 43        1541  N16646                67      29       13        0    IAH
## 44        1583  N39416                75      32       19        0    IAH
## 45        1679  N16732                85      41       25        5    IAH
## 46         241  N14604                63      28       14        4    IAH
## 47        1541  N24633                67      30       15        2    IAH
##      Dest Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted
## 1     DFW      224     14      37         0                         0
## 2     DFW      224     10      40         0                         0
## 3     DFW      224     10      35         0                         0
## 4     DFW      224     27      20         0                         0
## 5     AUS      140      5      23         0                         0
## 6     AUS      140      7      25         0                         0
## 7     AUS      140      5      30         0                         0
## 8     AUS      140      5      29         0                         0
## 9     AUS      140      6      27         0                         0
## 10    MSY      305     10      34         0                         0
## 11    SAT      191      6      37         0                         0
## 12    AUS      140      4      36         0                         0
## 13    AUS      140      6      34         0                         0
## 14    AUS      140      6      25         0                         0
## 15    MSY      305      3      49         0                         0
## 16    AUS      140     12      17         0                         0
## 17    AUS      140      8      25         0                         0
## 18    AUS      140      6      27         0                         0
## 19    AUS      140      7      22         0                         0
## 20    AUS      140      5      29         0                         0
## 21    AUS      140      5      33         0                         0
## 22    AUS      140      5      26         0                         0
## 23    AUS      140      6      46         0                         0
## 24    AUS      140      7      27         0                         0
## 25    AUS      140      8      38         0                         0
## 26    AUS      140      8      30         0                         0
## 27    DFW      224      9      43         0                         0
## 28    AUS      140      5      30         0                         0
## 29    AUS      140      7      26         0                         0
## 30    MSY      305      4      44         0                         0
## 31    SAT      191      6      52         0                         0
## 32    AUS      140      7      41         0                         0
## 33    MSY      305      5      52         0                         0
## 34    SAT      191      5      47         0                         0
## 35    MSY      305      4      45         0                         0
## 36    AUS      140      5      31         0                         0
## 37    SAT      191      5      40         0                         0
## 38    AUS      140      5      37         0                         0
## 39    AUS      140      6      32         0                         0
## 40    MFE      316      4      60         0                         0
## 41    AUS      140      5      24         0                         0
## 42    AUS      140      4      28         0                         0
## 43    AUS      140      5      33         0                         0
## 44    AUS      140      5      38         0                         0
## 45    SAT      191      4      40         0                         0
## 46    AUS      140      6      29         0                         0
## 47    AUS      140     12      25         0                         0
##  [ reached getOption("max.print") -- omitted 1342 rows ]
# All flights that departed before 5am or arrived after 10pm
filter(hflights, DepTime < 500 | ArrTime > 2200)
##       Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier
## 1     2011     1          4         2    2100    2207            AA
## 2     2011     1         14         5    2119    2229            AA
## 3     2011     1         10         1    1934    2235            AA
## 4     2011     1         26         3    1905    2211            AA
## 5     2011     1         30         7    1856    2209            AA
## 6     2011     1          9         7    1938    2228            AS
## 7     2011     1         31         1    1919    2231            CO
## 8     2011     1         31         1    2116    2344            CO
## 9     2011     1         31         1    1850    2211            CO
## 10    2011     1         31         1    2102    2216            CO
## 11    2011     1         31         1    1909    2254            CO
## 12    2011     1         31         1    1925    2202            CO
## 13    2011     1         31         1    2056    2217            CO
## 14    2011     1         31         1    1901    2332            CO
## 15    2011     1         31         1    2102    2222            CO
## 16    2011     1         31         1    1930    2225            CO
## 17    2011     1         31         1    1917    2234            CO
## 18    2011     1         31         1    1915    2248            CO
## 19    2011     1         31         1    1940    2349            CO
## 20    2011     1         31         1    1930    2224            CO
## 21    2011     1         31         1    2143    2338            CO
## 22    2011     1         31         1    2105    2311            CO
## 23    2011     1         31         1    1805    2211            CO
## 24    2011     1         31         1    2120    2323            CO
## 25    2011     1         31         1    2107    2247            CO
## 26    2011     1         31         1    1903    2228            CO
## 27    2011     1         31         1    2101    2215            CO
## 28    2011     1         31         1    2053    2235            CO
## 29    2011     1         31         1    1918    2247            CO
## 30    2011     1         31         1    2113    2215            CO
## 31    2011     1         31         1    1901    2203            CO
## 32    2011     1         31         1    2100    2252            CO
## 33    2011     1         31         1    1910    2212            CO
## 34    2011     1         31         1    2113    2253            CO
## 35    2011     1         31         1    1911    2220            CO
## 36    2011     1         31         1    1926    2318            CO
## 37    2011     1         31         1    1922    2229            CO
## 38    2011     1         30         7    2128    2252            CO
## 39    2011     1         30         7    1920    2236            CO
## 40    2011     1         30         7    1848    2224            CO
## 41    2011     1         30         7    2112    2301            CO
## 42    2011     1         30         7    1919    2335            CO
## 43    2011     1         30         7    1930    2230            CO
## 44    2011     1         30         7    1934    2318            CO
## 45    2011     1         30         7    2106    2212            CO
## 46    2011     1         30         7    2100    2229            CO
## 47    2011     1         30         7    1910    2215            CO
##       FlightNum TailNum ActualElapsedTime AirTime ArrDelay DepDelay Origin
## 1           533  N4XGAA                67      42       47       55    IAH
## 2           533  N549AA                70      45       69       74    IAH
## 3          1294  N3BXAA               121     107       80       99    IAH
## 4          1294  N3BXAA               126     111       56       70    IAH
## 5          1294  N3CPAA               133     108       54       61    IAH
## 6           731  N609AS               290     253       78       73    IAH
## 7           190  N35260               132     107      -12       -1    IAH
## 8           209  N24715               268     256      -15       -7    IAH
## 9           250  N59630               141     121      -18        0    IAH
## 10          299  N17244               134     119      -10        8    IAH
## 11          426  N78506               165     143      -12       -1    IAH
## 12          444  N76514               157     117        3        5    IAH
## 13          511  N77520                81      60        5        1    IAH
## 14          582  N19621               211     188       -1        6    IAH
## 15          597  N33203               200     179      -10        0    IAH
## 16          616  N33294               175     139       24       20    IAH
## 17          644  N37267               137     116       -3       22    IAH
## 18          658  N37273               153     128       -6       10    IAH
## 19          732  N33262               189     157       44       50    IAH
## 20          755  N11641               114      89      -10       -1    IAH
## 21          770  N37281               235     224       24       50    IAH
## 22          820  N37255               186     158       19       10    IAH
## 23         1010  N73259               186     161       -9       15    IAH
## 24         1046  N79402               123     106       -4       25    IAH
## 25         1095  N73270               220     201        7       12    IAH
## 26         1417  N46625               145     126      -32       -2    IAH
## 27         1423  N62631                74      42        4       -4    IAH
## 28         1459  N87513               222     201        2        3    IAH
## 29         1488  N16701               149     121      -24       -2    IAH
## 30         1533  N72405                62      30       20       13    IAH
## 31         1586  N75428               122     103       -9        6    IAH
## 32         1644  N76504               232     217      -13       10    IAH
## 33         1662  N17229               122      97       -9        0    IAH
## 34         1717  N77296               220     194       20       13    IAH
## 35         1748  N75433               129     108      -11        6    IAH
## 36         1776  N76503               172     142      -10       11    IAH
## 37         1830  N39423               127     103      -18       -3    IAH
## 38          150  N37281               204     188       15       31    IAH
## 39          190  N37267               136     107       -7        0    IAH
## 40          250  N73259               156     126       -1       -2    IAH
## 41          299  N12225               169     119       35       18    IAH
## 42          426  N17245               196     151       29        9    IAH
## 43          444  N27421               180     142       31       10    IAH
## 44          500  N79521               164     113       28        9    IAH
## 45          511  N54711                66      53        0       11    IAH
## 46          597  N36444               209     178        1        2    IAH
## 47          616  N18220               185     145       14        0    IAH
##       Dest Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted
## 1      DFW      224      3      22         0                         0
## 2      DFW      224      5      20         0                         0
## 3      MIA      964      3      11         0                         0
## 4      MIA      964      5      10         0                         0
## 5      MIA      964      7      18         0                         0
## 6      SEA     1874      5      32         0                         0
## 7      MIA      964      5      20         0                         0
## 8      PDX     1825      4       8         0                         0
## 9      RDU     1043      5      15         0                         0
## 10     DEN      862      6       9         0                         0
## 11     BWI     1235      4      18         0                         0
## 12     ORD      925     18      22         0                         0
## 13     MFE      316      4      17         0                         0
## 14     BOS     1597      7      16         0                         0
## 15     LAS     1222      9      12         0                         0
## 16     MSP     1034     11      25         0                         0
## 17     CLE     1091      5      16         0                         0
## 18     DCA     1208      9      16         0                         0
## 19     LGA     1416      5      27         0                         0
## 20     ATL      689      9      16         0                         0
## 21     SFO     1635      5       6         0                         0
## 22     PHX     1009      6      22         0                         0
## 23     EWR     1400      9      16         0                         0
## 24     ORD      925      7      10         0                         0
## 25     LAX     1379      7      12         0                         0
## 26     PIT     1117      6      13         0                         0
## 27     MSY      305      6      26         0                         0
## 28     SNA     1347      8      13         0                         0
## 29     DTW     1076      6      22         0                         0
## 30     AUS      140      7      25         0                         0
## 31     MCO      853      7      12         0                         0
## 32     SMF     1609      3      12         0                         0
## 33     RSW      861      4      21         0                         0
## 34     SAN     1303      3      23         0                         0
## 35     FLL      965      5      16         0                         0
## 36     PHL     1324      7      23         0                         0
## 37     PBI      956      5      19         0                         0
## 38     ONT     1334      4      12         0                         0
## 39     MIA      964      4      25         0                         0
## 40     RDU     1043      9      21         0                         0
## 41     DEN      862     24      26         0                         0
## 42     BWI     1235      7      38         0                         0
## 43     ORD      925      5      33         0                         0
## 44     IND      845      9      42         0                         0
## 45     MFE      316      4       9         0                         0
## 46     LAS     1222      7      24         0                         0
## 47     MSP     1034      5      35         0                         0
##  [ reached getOption("max.print") -- omitted 27752 rows ]
# All flights that departed late but arrived ahead of schedule
filter(hflights, DepDelay > 0 & ArrDelay < 0)
##       Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier
## 1     2011     1          2         7    1401    1501            AA
## 2     2011     1          5         3    1405    1507            AA
## 3     2011     1         18         2    1408    1508            AA
## 4     2011     1         18         2     721     827            AA
## 5     2011     1         12         3    2015    2113            AA
## 6     2011     1         13         4    2020    2116            AA
## 7     2011     1         26         3    2009    2103            AA
## 8     2011     1          1         6    1631    1736            AA
## 9     2011     1         10         1    1639    1740            AA
## 10    2011     1         12         3    1631    1739            AA
## 11    2011     1         15         6    1632    1736            AA
## 12    2011     1         17         1    1632    1744            AA
## 13    2011     1         27         4    1634    1740            AA
## 14    2011     1         30         7    1635    1733            AA
## 15    2011     1          1         6    1756    2112            AA
## 16    2011     1          7         5    1757    2108            AA
## 17    2011     1         31         1    1757    2101            AA
## 18    2011     1          5         3     916    1019            AA
## 19    2011     1         24         1     912    1017            AA
## 20    2011     1          4         2    1026    1333            AA
## 21    2011     1          5         3    1021    1331            AA
## 22    2011     1          9         7    1029    1338            AA
## 23    2011     1         14         5    1024    1327            AA
## 24    2011     1         16         7    1021    1332            AA
## 25    2011     1         12         3    1206    1305            AA
## 26    2011     1         23         7    1217    1308            AA
## 27    2011     1         16         7     605     910            AA
## 28    2011     1          3         1    1827    2107            AS
## 29    2011     1         25         2    1826    2101            AS
## 30    2011     1          2         7     703    1113            B6
## 31    2011     1          5         3    1544    1954            B6
## 32    2011     1          7         5    1542    1956            B6
## 33    2011     1         20         4    1538    1952            B6
## 34    2011     1         22         6     701    1106            B6
## 35    2011     1         24         1     707    1059            B6
## 36    2011     1         31         1    1206    1631            CO
## 37    2011     1         31         1     851    1052            CO
## 38    2011     1         31         1    1551    2009            CO
## 39    2011     1         31         1    1024    1621            CO
## 40    2011     1         31         1    1554    1918            CO
## 41    2011     1         31         1    2102    2216            CO
## 42    2011     1         31         1    1031    1203            CO
## 43    2011     1         31         1    1146    1421            CO
## 44    2011     1         31         1    1748    2001            CO
## 45    2011     1         31         1    1901    2332            CO
## 46    2011     1         31         1    1917    2234            CO
## 47    2011     1         31         1    1915    2248            CO
##       FlightNum TailNum ActualElapsedTime AirTime ArrDelay DepDelay Origin
## 1           428  N557AA                60      45       -9        1    IAH
## 2           428  N492AA                62      44       -3        5    IAH
## 3           428  N507AA                60      42       -2        8    IAH
## 4           460  N558AA                66      46       -8        1    IAH
## 5           533  N555AA                58      39       -7       10    IAH
## 6           533  N4XCAA                56      44       -4       15    IAH
## 7           533  N403AA                54      39      -17        4    IAH
## 8          1121  N4WVAA                65      37       -9        1    IAH
## 9          1121  N531AA                61      41       -5        9    IAH
## 10         1121  N468AA                68      44       -6        1    IAH
## 11         1121  N274AA                64      48       -9        2    IAH
## 12         1121  N580AA                72      51       -1        2    IAH
## 13         1121  N557AA                66      43       -5        4    IAH
## 14         1121  N403AA                58      39      -12        5    IAH
## 15         1294  N3DGAA               136     113       -3        1    IAH
## 16         1294  N3CYAA               131     107       -7        2    IAH
## 17         1294  N3DJAA               124     104      -14        2    IAH
## 18         1436  N548AA                63      40       -6        6    IAH
## 19         1436  N567AA                65      42       -8        2    IAH
## 20         1700  N3BAAA               127     109       -7        6    IAH
## 21         1700  N3CDAA               130     108       -9        1    IAH
## 22         1700  N3CPAA               129     110       -2        9    IAH
## 23         1700  N3AUAA               123     110      -13        4    IAH
## 24         1700  N3BJAA               131     118       -8        1    IAH
## 25         1820  N593AA                59      41       -5        1    IAH
## 26         1820  N436AA                51      39       -2       12    IAH
## 27         1994  N3AHAA               125     112       -5        5    IAH
## 28          731  N627AS               280     260       -3        2    IAH
## 29          731  N607AS               275     255       -9        1    IAH
## 30          620  N324JB               190     172       -6        3    HOU
## 31          624  N644JB               190     166       -6        9    HOU
## 32          624  N564JB               194     175       -4        7    HOU
## 33          624  N760JB               194     161       -8        3    HOU
## 34          620  N527JB               185     161      -11        1    HOU
## 35          620  N605JB               172     156      -18        7    HOU
## 36           60  N68159               205     165       -2        1    IAH
## 37          170  N35407               241     225      -27        1    IAH
## 38          210  N37408               198     166      -15        1    IAH
## 39          212  N53441               237     215       -4        9    IAH
## 40          244  N37274               144     121      -16        9    IAH
## 41          299  N17244               134     119      -10        8    IAH
## 42          403  N35271               212     194       -1        1    IAH
## 43          546  N77518               155     116       -1        1    IAH
## 44          570  N75436               253     236       -4        3    IAH
## 45          582  N19621               211     188       -1        6    IAH
## 46          644  N37267               137     116       -3       22    IAH
## 47          658  N37273               153     128       -6       10    IAH
##       Dest Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted
## 1      DFW      224      6       9         0                         0
## 2      DFW      224      9       9         0                         0
## 3      DFW      224      7      11         0                         0
## 4      DFW      224      7      13         0                         0
## 5      DFW      224      9      10         0                         0
## 6      DFW      224      4       8         0                         0
## 7      DFW      224      9       6         0                         0
## 8      DFW      224     16      12         0                         0
## 9      DFW      224      8      12         0                         0
## 10     DFW      224      5      19         0                         0
## 11     DFW      224      5      11         0                         0
## 12     DFW      224     10      11         0                         0
## 13     DFW      224     10      13         0                         0
## 14     DFW      224      9      10         0                         0
## 15     MIA      964      9      14         0                         0
## 16     MIA      964      9      15         0                         0
## 17     MIA      964      4      16         0                         0
## 18     DFW      224      4      19         0                         0
## 19     DFW      224      9      14         0                         0
## 20     MIA      964      5      13         0                         0
## 21     MIA      964     10      12         0                         0
## 22     MIA      964     10       9         0                         0
## 23     MIA      964      5       8         0                         0
## 24     MIA      964      4       9         0                         0
## 25     DFW      224      8      10         0                         0
## 26     DFW      224      5       7         0                         0
## 27     MIA      964      4       9         0                         0
## 28     SEA     1874      4      16         0                         0
## 29     SEA     1874      4      16         0                         0
## 30     JFK     1428      6      12         0                         0
## 31     JFK     1428     14      10         0                         0
## 32     JFK     1428      9      10         0                         0
## 33     JFK     1428     16      17         0                         0
## 34     JFK     1428      3      21         0                         0
## 35     JFK     1428      4      12         0                         0
## 36     EWR     1400     29      11         0                         0
## 37     SFO     1635      6      10         0                         0
## 38     EWR     1400     16      16         0                         0
## 39     SJU     2007      6      16         0                         0
## 40     CLE     1091      5      18         0                         0
## 41     DEN      862      6       9         0                         0
## 42     SAN     1303      4      14         0                         0
## 43     ORD      925     10      29         0                         0
## 44     SFO     1635      7      10         0                         0
## 45     BOS     1597      7      16         0                         0
## 46     CLE     1091      5      16         0                         0
## 47     DCA     1208      9      16         0                         0
##  [ reached getOption("max.print") -- omitted 27665 rows ]
# All flights that were cancelled after being delayed
filter(hflights, Cancelled == 1 & DepDelay > 0)
##    Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier FlightNum
## 1  2011     1         26         3    1926      NA            CO       310
## 2  2011     1         11         2    1100      NA            US       944
## 3  2011     1         19         3    1811      NA            XE      2376
## 4  2011     1          7         5    2028      NA            XE      3050
## 5  2011     2          4         5    1638      NA            AA      1121
## 6  2011     2          8         2    1057      NA            CO       408
## 7  2011     2          2         3     802      NA            XE      2189
## 8  2011     2          9         3     904      NA            XE      2605
## 9  2011     2          1         2    1508      NA            OO      5812
## 10 2011     3         31         4    1016      NA            CO       586
## 11 2011     4          4         1    1632      NA            DL         8
## 12 2011     4          8         5    1608      NA            WN         4
## 13 2011     4         21         4     953      NA            WN      3840
## 14 2011     4          4         1    1946      NA            XE      2700
## 15 2011     4          4         1    1854      NA            XE      2748
## 16 2011     4         11         1    1400      NA            XE      2769
## 17 2011     4         25         1    1131      NA            XE      2826
## 18 2011     4         25         1    1257      NA            XE      2964
## 19 2011     6         15         3    1740      NA            DL         8
## 20 2011     6         11         6    1649      NA            FL      1595
## 21 2011     7         25         1    1654      NA            CO      1422
## 22 2011     7         21         4    2059      NA            XE      2705
## 23 2011     7         12         2    1556      NA            UA       993
## 24 2011     7          7         4    1245      NA            US      1170
## 25 2011     8         14         7    1938      NA            XE      3019
## 26 2011     8         11         4    2002      NA            OO      5810
## 27 2011     8         18         4    1808      NA            AA      1294
## 28 2011     9         29         4    1800      NA            XE      2125
## 29 2011     9         26         1    1544      NA            EV      5222
## 30 2011     9         16         5    1807      NA            OO      2009
## 31 2011     9         16         5    1234      NA            OO      2059
## 32 2011    10         24         1     631      NA            WN         2
## 33 2011    11          8         2    1855      NA            XE      4343
## 34 2011    11         15         2    1452      NA            XE      4450
## 35 2011    12         13         2    2324      NA            XE      4088
## 36 2011    12         11         7    2245      NA            XE      4342
## 37 2011    12         21         3    2159      NA            XE      4595
## 38 2011    12         31         6    1738      NA            XE      4662
## 39 2011    12         22         4    1633      NA            OO      5159
## 40 2011    12         28         3    1827      NA            OO      5244
##    TailNum ActualElapsedTime AirTime ArrDelay DepDelay Origin Dest
## 1   N77865                NA      NA       NA       26    IAH  EWR
## 2   N452UW                NA      NA       NA      135    IAH  CLT
## 3   N15932                NA      NA       NA        6    IAH  ICT
## 4   N15912                NA      NA       NA       73    IAH  JAX
## 5   N537AA                NA      NA       NA        8    IAH  DFW
## 6   N11641                NA      NA       NA      187    IAH  EWR
## 7   N17928                NA      NA       NA        2    IAH  DAL
## 8   N15941                NA      NA       NA        4    IAH  DAL
## 9   N959SW                NA      NA       NA       28    IAH  ATL
## 10  N19136                NA      NA       NA      156    IAH  MCO
## 11  N600TR                NA      NA       NA       42    IAH  ATL
## 12  N365SW                NA      NA       NA      548    HOU  DAL
## 13  N455WN                NA      NA       NA        3    HOU  SAT
## 14  N15973                NA      NA       NA      131    IAH  ATL
## 15  N13968                NA      NA       NA      109    IAH  MOB
## 16  N13970                NA      NA       NA        5    IAH  ELP
## 17  N15926                NA      NA       NA        1    IAH  DAL
## 18  N13936                NA      NA       NA       87    IAH  DFW
## 19  N704DK                NA      NA       NA      110    IAH  ATL
## 20  N946AT                NA      NA       NA       64    HOU  BKG
## 21  N58606                NA      NA       NA       24    IAH  ATL
## 22  N15926                NA      NA       NA      271    IAH  GPT
## 23  N808UA                NA      NA       NA      110    IAH  IAD
## 24  N456UW                NA      NA       NA      153    IAH  CLT
## 25  N13994                NA      NA       NA      173    IAH  CMH
## 26  N978SW                NA      NA       NA       37    IAH  MAF
## 27  N3FLAA                NA      NA       NA        3    IAH  MIA
## 28  N14570                NA      NA       NA       83    IAH  MOB
## 29  N851AS                NA      NA       NA      220    IAH  MEM
## 30  N794SK                NA      NA       NA       27    IAH  DFW
## 31  N752SK                NA      NA       NA       64    IAH  ELP
## 32  N359SW                NA      NA       NA        1    HOU  DAL
## 33  N16944                NA      NA       NA       75    IAH  LFT
## 34  N13903                NA      NA       NA       21    IAH  LFT
## 35  N14938                NA      NA       NA      129    IAH  LFT
## 36  N11535                NA      NA       NA       80    IAH  LRD
## 37  N16149                NA      NA       NA       64    IAH  VPS
## 38  N16944                NA      NA       NA       21    IAH  PNS
## 39  N724SK                NA      NA       NA      103    IAH  ATL
## 40  N912SW                NA      NA       NA       27    IAH  ELP
##    Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted
## 1      1400     NA      NA         1                B        0
## 2       913     NA      NA         1                B        0
## 3       542     NA      NA         1                B        0
## 4       817     NA      19         1                A        0
## 5       224     NA      19         1                A        0
## 6      1400     NA      NA         1                A        0
## 7       217     NA      NA         1                B        0
## 8       217     NA      NA         1                B        0
## 9       689     NA      19         1                A        0
## 10      853     NA      NA         1                B        0
## 11      689     NA      NA         1                A        0
## 12      239     NA      NA         1                A        0
## 13      192     NA       5         1                A        0
## 14      689     NA      19         1                B        0
## 15      427     NA      NA         1                A        0
## 16      667     NA      26         1                A        0
## 17      217     NA      13         1                B        0
## 18      224     NA      NA         1                B        0
## 19      689     NA      NA         1                B        0
## 20      490     NA      25         1                A        0
## 21      689     NA      NA         1                C        0
## 22      376     NA      NA         1                A        0
## 23     1190     NA      NA         1                A        0
## 24      912     NA      15         1                A        0
## 25      986     NA      NA         1                B        0
## 26      429     NA      15         1                A        0
## 27      964     NA      NA         1                A        0
## 28      427     NA      NA         1                C        0
## 29      468     NA      NA         1                A        0
## 30      224     NA      10         1                B        0
## 31      667     NA      12         1                B        0
## 32      239     NA       6         1                A        0
## 33      201     NA      NA         1                B        0
## 34      201     NA      NA         1                B        0
## 35      201     NA       8         1                B        0
## 36      301     NA       8         1                B        0
## 37      528     NA       9         1                B        0
## 38      489     NA      14         1                B        0
## 39      689     NA      NA         1                B        0
## 40      667     NA      20         1                A        0
# Combinando select, mutate y filter
# Select the flights that had JFK as their destination: c1
c1 <- filter(hflights, Dest == "JFK")

# Combine the Year, Month and DayofMonth variables to create a Date column: c2
c2 <- mutate(c1, Date = paste(Year, Month, DayofMonth, sep = "-"))

# Print out a selection of columns of c2
select(c2, Date, DepTime, ArrTime, TailNum)
##           Date DepTime ArrTime TailNum
## 1     2011-1-1     654    1124  N324JB
## 2     2011-1-1    1639    2110  N324JB
## 3     2011-1-2     703    1113  N324JB
## 4     2011-1-2    1604    2040  N324JB
## 5     2011-1-3     659    1100  N229JB
## 6     2011-1-3    1801    2200  N206JB
## 7     2011-1-4     654    1103  N267JB
## 8     2011-1-4    1608    2034  N267JB
## 9     2011-1-5     700    1103  N708JB
## 10    2011-1-5    1544    1954  N644JB
## 11    2011-1-6    1532    1943  N641JB
## 12    2011-1-7     654    1117  N641JB
## 13    2011-1-7    1542    1956  N564JB
## 14    2011-1-8     654    1058  N630JB
## 15    2011-1-9     653    1059  N599JB
## 16    2011-1-9    1618    2057  N625JB
## 17   2011-1-10     656    1102  N625JB
## 18   2011-1-10    1554    2001  N504JB
## 19   2011-1-11     653    1053  N504JB
## 20   2011-1-11      NA      NA  N537JB
## 21   2011-1-12    1532    1953  N504JB
## 22   2011-1-13    1522    1938  N597JB
## 23   2011-1-14     808    1229  N597JB
## 24   2011-1-14    1534    2015  N729JB
## 25   2011-1-15     700    1114  N503JB
## 26   2011-1-16     652    1055  N706JB
## 27   2011-1-16    1551    2004  N565JB
## 28   2011-1-17     730    1135  N523JB
## 29   2011-1-17    1531    1946  N779JB
## 30   2011-1-18     659    1102  N779JB
## 31   2011-1-18    1647    2056  N729JB
## 32   2011-1-19      NA      NA  N504JB
## 33   2011-1-20    1538    1952  N760JB
## 34   2011-1-21     656    1104  N760JB
## 35   2011-1-21    1725    2135  N598JB
## 36   2011-1-22     701    1106  N527JB
## 37   2011-1-23     658    1058  N580JB
## 38   2011-1-23    1535    1933  N599JB
## 39   2011-1-24     707    1059  N605JB
## 40   2011-1-24    1532    1923  N536JB
## 41   2011-1-25     658    1102  N589JB
## 42   2011-1-25    1623    2029  N621JB
## 43   2011-1-26    1535    1941  N659JB
## 44   2011-1-27      NA      NA  N569JB
## 45   2011-1-28     655    1107  N594JB
## 46   2011-1-28    1538    2013  N655JB
## 47   2011-1-29     657    1128  N508JB
## 48   2011-1-30     651    1106  N606JB
## 49   2011-1-30    1659    2118  N606JB
## 50   2011-1-31     659    1111  N661JB
## 51   2011-1-31    1532    1942  N629JB
## 52    2011-2-1      NA      NA  N629JB
## 53    2011-2-1      NA      NA  N558JB
## 54    2011-2-2    1531    1946  N768JB
## 55    2011-2-3    1938    2341  N552JB
## 56    2011-2-4      NA      NA  N665JB
## 57    2011-2-4    1547    2000  N570JB
## 58    2011-2-5     700    1113  N653JB
## 59    2011-2-6     700    1045  N657JB
## 60    2011-2-6    1529    1917  N531JB
## 61    2011-2-7     659    1045  N607JB
## 62    2011-2-7    1537    1952  N607JB
## 63    2011-2-8     654    1049  N607JB
## 64    2011-2-8    1546    1955  N606JB
## 65    2011-2-9    1533    1934  N618JB
## 66   2011-2-10    1550    1956  N565JB
## 67   2011-2-11     659    1100  N580JB
## 68   2011-2-11    1538    1937  N586JB
## 69   2011-2-12     659    1105  N612JB
## 70   2011-2-13     809    1235  N503JB
## 71   2011-2-13    1529    2012  N630JB
## 72   2011-2-14     655    1113  N526JB
## 73   2011-2-14    1532    1950  N648JB
## 74   2011-2-15     658    1118  N648JB
## 75   2011-2-15    1541    2007  N615JB
## 76   2011-2-16    1527    2003  N661JB
## 77   2011-2-17     708    2058  N247JB
## 78   2011-2-17    1742    2200  N289JB
## 79   2011-2-18     658    1109  N228JB
## 80   2011-2-18    1534    2024  N228JB
## 81   2011-2-19     703    1114  N228JB
## 82   2011-2-19    1653    2207  N228JB
## 83   2011-2-20    1107    1517  N766JB
## 84   2011-2-20    1530    2000  N190JB
## 85   2011-2-21     657    1106  N316JB
## 86   2011-2-21    1545    1956  N179JB
## 87   2011-2-22     659    1108  N197JB
## 88   2011-2-22    1535    1949  N197JB
## 89   2011-2-23     700    1106  N318JB
## 90   2011-2-23    1546    2002  N318JB
## 91   2011-2-24     653    1102  N179JB
## 92   2011-2-24    1545    2013  N179JB
## 93   2011-2-25     653    1130  N292JB
## 94   2011-2-25    1824    2314  N273JB
## 95   2011-2-26     652    1054  N292JB
## 96   2011-2-26    1538    1951  N292JB
## 97   2011-2-27     658    1107  N198JB
## 98   2011-2-27    1606    2020  N198JB
## 99   2011-2-28     655    1115  N206JB
## 100  2011-2-28    1710    2145  N206JB
## 101   2011-3-1     657    1115  N309JB
## 102   2011-3-1    1608    2022  N316JB
## 103   2011-3-2     659    1121  N236JB
## 104   2011-3-2    1657    2128  N298JB
## 105   2011-3-3     659    1148  N266JB
## 106   2011-3-3    1553    2013  N266JB
## 107   2011-3-4     700    1114  N316JB
## 108   2011-3-4    1537    2001  N316JB
## 109   2011-3-5     656    1119  N197JB
## 110   2011-3-6     656    1109  N197JB
## 111   2011-3-6    1839       5  N192JB
## 112   2011-3-7      NA      NA  N190JB
## 113   2011-3-7    1716    2128  N229JB
## 114   2011-3-8     655    1106  N228JB
## 115   2011-3-8    1555    2006  N228JB
## 116   2011-3-9     700    1112  N296JB
## 117   2011-3-9    1551    2052  N296JB
## 118  2011-3-10     706    1109  N307JB
## 119  2011-3-10    2013     141  N307JB
## 120  2011-3-11     704    1116  N304JB
## 121  2011-3-11    1529    1953  N316JB
## 122  2011-3-12     651    1128  N266JB
## 123  2011-3-13     700    1108  N266JB
## 124  2011-3-13    1526    2028  N298JB
## 125  2011-3-14     652    1106  N281JB
## 126  2011-3-14    1547    2003  N229JB
## 127  2011-3-15     659    1127  N317JB
## 128  2011-3-15    1542    1958  N317JB
## 129  2011-3-16     650    1102  N192JB
## 130  2011-3-16    1538    1953  N192JB
## 131  2011-3-17     655    1133  N323JB
## 132  2011-3-17    1532    2009  N323JB
## 133  2011-3-18     651    1127  N289JB
## 134  2011-3-18    1603    2022  N258JB
## 135  2011-3-19     655    1113  N307JB
## 136  2011-3-20     657    1119  N307JB
## 137  2011-3-20    1530    2000  N307JB
## 138  2011-3-21     656    1120  N317JB
## 139  2011-3-21    1700    2140  N317JB
## 140  2011-3-22     654    1118  N198JB
## 141  2011-3-22    1534    2001  N198JB
## 142  2011-3-23     652    1058  N179JB
## 143  2011-3-23    1649    2130  N179JB
## 144  2011-3-24    1040    1504  N198JB
## 145  2011-3-24    1703    2116  N267JB
## 146  2011-3-25     653    1112  N316JB
## 147  2011-3-25    1540    1950  N316JB
## 148  2011-3-26     656    1104  N283JB
## 149  2011-3-27     657    1107  N283JB
## 150  2011-3-27    1537    1937  N283JB
## 151  2011-3-28     652    1104  N236JB
## 152  2011-3-28    1528    1938  N236JB
## 153  2011-3-29     650    1101  N229JB
## 154  2011-3-29    1624    2033  N236JB
## 155  2011-3-30     655    1106  N289JB
## 156  2011-3-30    1606    2039  N289JB
## 157  2011-3-31     652    1056  N266JB
## 158  2011-3-31    1531    1951  N266JB
## 159   2011-4-1     734    1146  N236JB
## 160   2011-4-1    1615    2048  N296JB
## 161   2011-4-2     658    1109  N187JB
## 162   2011-4-3     655    1122  N229JB
## 163   2011-4-3    1530    2005  N317JB
## 164   2011-4-4     659    1111  N317JB
## 165   2011-4-4    1537    2037  N317JB
## 166   2011-4-5     656    1130  N197JB
## 167   2011-4-5    1533    1954  N198JB
## 168   2011-4-6     655    1111  N323JB
## 169   2011-4-6    1556    2043  N238JB
## 170   2011-4-7     656    1111  N324JB
## 171   2011-4-7    1519    1946  N206JB
## 172   2011-4-8     652    1112  N203JB
## 173   2011-4-8    1526    1937  N316JB
## 174   2011-4-9     656    1125  N274JB
## 175  2011-4-10     704    1127  N317JB
## 176  2011-4-10    1529    2004  N239JB
## 177  2011-4-11     657    1124  N304JB
## 178  2011-4-11    1552    2050  N318JB
## 179  2011-4-12     654    1118  N318JB
## 180  2011-4-12    1531    2020  N183JB
## 181  2011-4-13     656    1106  N266JB
## 182  2011-4-13    1624    2151  N298JB
## 183  2011-4-14     652    1101  N247JB
## 184  2011-4-14    1524    1954  N247JB
## 185  2011-4-15     657    1117  N294JB
## 186  2011-4-15    1529    2004  N265JB
## 187  2011-4-16     813    1237  N183JB
## 188  2011-4-17     703    1108  N279JB
## 189  2011-4-17      NA      NA  N306JB
## 190  2011-4-18     657    1109  N309JB
## 191  2011-4-18    1531    2020  N216JB
## 192  2011-4-19     701    1115  N216JB
## 193  2011-4-19    1541    2024  N316JB
## 194  2011-4-20     849    1306  N183JB
## 195  2011-4-20      NA      NA  N216JB
## 196  2011-4-21     654    1120  N192JB
## 197  2011-4-21    1527    2023  N187JB
## 198  2011-4-22     656    1111  N187JB
## 199  2011-4-22    1526    1952  N216JB
## 200  2011-4-23     655    1120  N216JB
## 201  2011-4-24     701    1123  N178JB
## 202  2011-4-24    1529     153  N273JB
## 203  2011-4-25     653    1115  N179JB
## 204  2011-4-25    1526    2008  N316JB
## 205  2011-4-26     656    1122  N316JB
## 206  2011-4-26    1618    2104  N323JB
## 207  2011-4-27     657    1123  N323JB
## 208  2011-4-27    1529    2124  N283JB
## 209  2011-4-28     659    1726  N229JB
## 210  2011-4-28      NA      NA  N228JB
## 211  2011-4-29     658    1105  N203JB
## 212  2011-4-29    1529    1943  N228JB
## 213  2011-4-30     654    1128  N190JB
## 214   2011-5-1     701    1121  N231JB
## 215   2011-5-1    1545    2009  N206JB
## 216   2011-5-2     702    1127  N206JB
## 217   2011-5-2    1521    2015  N316JB
## 218   2011-5-3     653    1101  N192JB
## 219   2011-5-3    1618    2031  N283JB
## 220   2011-5-4     700    1056  N190JB
## 221   2011-5-4    1604    2123  N316JB
## 222   2011-5-5     654    1056  N231JB
## 223   2011-5-5    1530    2028  N279JB
## 224   2011-5-6     702    1116  N279JB
## 225   2011-5-6    1539    1945  N304JB
## 226   2011-5-7     659    1106  N304JB
## 227   2011-5-8     701    1112  N306JB
## 228   2011-5-8    1517    1939  N216JB
## 229   2011-5-9     701    1125  N296JB
## 230   2011-5-9    1511    1936  N236JB
## 231  2011-5-10     654    1129  N316JB
## 232  2011-5-10    1516    1951  N216JB
## 233  2011-5-11     656    1134  N216JB
## 234  2011-5-11    1513    1942  N236JB
## 235  2011-5-12     656    1126  N184JB
## 236  2011-5-12    1511    1949  N298JB
## 237  2011-5-13     653    1115  N298JB
## 238  2011-5-13    1526    2019  N228JB
## 239  2011-5-14     658    1121  N203JB
## 240  2011-5-15     706    1125  N266JB
## 241  2011-5-15    1810    2312  N203JB
## 242  2011-5-16     659    1115  N197JB
## 243  2011-5-16    1531    2025  N267JB
## 244  2011-5-17     654    1109  N283JB
## 245  2011-5-17    1620    2045  N197JB
## 246  2011-5-18     700    1119  N192JB
## 247  2011-5-18    1758    2212  N267JB
## 248  2011-5-19     709    1120  N267JB
## 249  2011-5-19    1545    2025  N281JB
## 250  2011-5-20     654    1118  N247JB
##  [ reached getOption("max.print") -- omitted 445 rows ]

7.2.4 arrange()

La función arrange() nos permite ordenar los datos. Su estructura es similar a las anteriores funciones, donde indicamos al principio el nombre del conjunto de datos, seguido de la primera columna a ordenar, a continuación la segunda, etc. En este caso, el orden de las columnas dentro de la función importa, ya que la jerarquía va desde la primera a la última.

Esta función ordena los datos según orden alfabético (si es cadena de texto), numérico o incluso factorial. En este último caso, cada factor tiene un número asociado que podríamos ver con levels().

De forma predeterminada, la función arrange() ordena de menor a mayor.

# Definition of dtc
dtc <- filter(hflights, Cancelled == 1, !is.na(DepDelay))

# Arrange dtc by departure delays
arrange(dtc, DepDelay)
##    Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier FlightNum
## 1  2011     7         23         6     605      NA            F9       225
## 2  2011     1         17         1     916      NA            XE      3068
## 3  2011    12          1         4     541      NA            US       282
## 4  2011    10         12         3    2022      NA            MQ      3724
## 5  2011     7         29         5    1424      NA            CO      1079
## 6  2011     9         29         4    1639      NA            OO      2062
## 7  2011     2          9         3     555      NA            MQ      3265
## 8  2011     5          9         1     715      NA            OO      1177
## 9  2011     1         20         4    1413      NA            UA       552
## 10 2011     1         17         1     831      NA            WN         1
## 11 2011     2         21         1    2257      NA            OO      1111
## 12 2011     3         18         5     727      NA            UA       109
## 13 2011     4         30         6     612      NA            EV      5386
## 14 2011     4         10         7    1147      NA            EV      5402
## 15 2011     5         23         1     657      NA            EV      5445
## 16 2011     6         20         1    1037      NA            OO      5817
## 17 2011     7         17         7    1917      NA            MQ      3717
## 18 2011    12         25         7    1652      NA            XE      4375
## 19 2011     4          7         4    2118      NA            OO      5819
## 20 2011     9         29         4     723      NA            EV      4882
## 21 2011    10         12         3     758      NA            WN         8
## 22 2011     5         16         1    1619      NA            EV      5401
## 23 2011     6         26         7    1629      NA            WN        42
## 24 2011     8         26         5    1054      NA            XE      2397
## 25 2011     8         18         4    1809      NA            OO      2028
## 26 2011     1         26         3    1703      NA            CO       410
## 27 2011     8         11         4    1320      NA            CO      1669
## 28 2011    12         11         7    1303      NA            UA       399
## 29 2011     4         25         1    1131      NA            XE      2826
## 30 2011    10         24         1     631      NA            WN         2
## 31 2011     2          2         3     802      NA            XE      2189
## 32 2011     4         21         4     953      NA            WN      3840
## 33 2011     8         18         4    1808      NA            AA      1294
## 34 2011     2          9         3     904      NA            XE      2605
## 35 2011     4         11         1    1400      NA            XE      2769
## 36 2011     1         19         3    1811      NA            XE      2376
## 37 2011     2          4         5    1638      NA            AA      1121
## 38 2011    11         15         2    1452      NA            XE      4450
## 39 2011    12         31         6    1738      NA            XE      4662
## 40 2011     7         25         1    1654      NA            CO      1422
## 41 2011     1         26         3    1926      NA            CO       310
## 42 2011     9         16         5    1807      NA            OO      2009
## 43 2011    12         28         3    1827      NA            OO      5244
## 44 2011     2          1         2    1508      NA            OO      5812
## 45 2011     8         11         4    2002      NA            OO      5810
## 46 2011     4          4         1    1632      NA            DL         8
## 47 2011     6         11         6    1649      NA            FL      1595
##    TailNum ActualElapsedTime AirTime ArrDelay DepDelay Origin Dest
## 1   N912FR                NA      NA       NA      -10    HOU  DEN
## 2   N13936                NA      NA       NA       -9    IAH  HRL
## 3   N840AW                NA      NA       NA       -9    IAH  PHX
## 4   N539MQ                NA      NA       NA       -8    IAH  LAX
## 5   N14628                NA      NA       NA       -6    IAH  ORD
## 6   N724SK                NA      NA       NA       -6    IAH  ATL
## 7   N613MQ                NA      NA       NA       -5    HOU  DFW
## 8   N758SK                NA      NA       NA       -5    IAH  DTW
## 9   N509UA                NA      NA       NA       -4    IAH  IAD
## 10  N714CB                NA      NA       NA       -4    HOU  HRL
## 11  N778SK                NA      NA       NA       -3    IAH  AUS
## 12  N469UA                NA      NA       NA       -3    IAH  DEN
## 13  N844AS                NA      NA       NA       -3    IAH  MEM
## 14  N684BR                NA      NA       NA       -3    IAH  MEM
## 15  N606LR                NA      NA       NA       -3    HOU  ATL
## 16  N443SW                NA      NA       NA       -3    IAH  CLT
## 17  N503MQ                NA      NA       NA       -3    IAH  ORD
## 18  N12946                NA      NA       NA       -3    IAH  MOB
## 19  N916SW                NA      NA       NA       -2    IAH  MAF
## 20  N355CA                NA      NA       NA       -2    IAH  DTW
## 21  N405WN                NA      NA       NA       -2    HOU  DAL
## 22  N856AS                NA      NA       NA       -1    IAH  MEM
## 23  N640SW                NA      NA       NA       -1    HOU  DAL
## 24  N11192                NA      NA       NA       -1    IAH  CHS
## 25  N758SK                NA      NA       NA       -1    IAH  OMA
## 26  N77296                NA      NA       NA        0    IAH  IAD
## 27  N73275                NA      NA       NA        0    IAH  MIA
## 28  N528UA                NA      NA       NA        0    IAH  DEN
## 29  N15926                NA      NA       NA        1    IAH  DAL
## 30  N359SW                NA      NA       NA        1    HOU  DAL
## 31  N17928                NA      NA       NA        2    IAH  DAL
## 32  N455WN                NA      NA       NA        3    HOU  SAT
## 33  N3FLAA                NA      NA       NA        3    IAH  MIA
## 34  N15941                NA      NA       NA        4    IAH  DAL
## 35  N13970                NA      NA       NA        5    IAH  ELP
## 36  N15932                NA      NA       NA        6    IAH  ICT
## 37  N537AA                NA      NA       NA        8    IAH  DFW
## 38  N13903                NA      NA       NA       21    IAH  LFT
## 39  N16944                NA      NA       NA       21    IAH  PNS
## 40  N58606                NA      NA       NA       24    IAH  ATL
## 41  N77865                NA      NA       NA       26    IAH  EWR
## 42  N794SK                NA      NA       NA       27    IAH  DFW
## 43  N912SW                NA      NA       NA       27    IAH  ELP
## 44  N959SW                NA      NA       NA       28    IAH  ATL
## 45  N978SW                NA      NA       NA       37    IAH  MAF
## 46  N600TR                NA      NA       NA       42    IAH  ATL
## 47  N946AT                NA      NA       NA       64    HOU  BKG
##    Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted
## 1       883     NA      10         1                A        0
## 2       295     NA      NA         1                B        0
## 3      1009     NA      NA         1                A        0
## 4      1379     NA      NA         1                A        0
## 5       925     NA      13         1                A        0
## 6       689     NA      NA         1                B        0
## 7       247     NA      11         1                A        0
## 8      1076     NA      17         1                A        0
## 9      1190     NA      NA         1                A        0
## 10      276     NA       8         1                B        0
## 11      140     NA      NA         1                A        0
## 12      862     NA      NA         1                A        0
## 13      469     NA      NA         1                A        0
## 14      469     NA      NA         1                A        0
## 15      696     NA      NA         1                A        0
## 16      913     NA      NA         1                A        0
## 17      925     NA      NA         1                A        0
## 18      427     NA      NA         1                A        0
## 19      429     NA      NA         1                A        0
## 20     1075     NA      NA         1                A        0
## 21      239     NA      NA         1                A        0
## 22      469     NA      NA         1                A        0
## 23      239     NA      13         1                A        0
## 24      925     NA      NA         1                B        0
## 25      781     NA      32         1                B        0
## 26     1190     NA      13         1                B        0
## 27      964     NA      NA         1                A        0
## 28      862     NA      NA         1                A        0
## 29      217     NA      13         1                B        0
## 30      239     NA       6         1                A        0
## 31      217     NA      NA         1                B        0
## 32      192     NA       5         1                A        0
## 33      964     NA      NA         1                A        0
## 34      217     NA      NA         1                B        0
## 35      667     NA      26         1                A        0
## 36      542     NA      NA         1                B        0
## 37      224     NA      19         1                A        0
## 38      201     NA      NA         1                B        0
## 39      489     NA      14         1                B        0
## 40      689     NA      NA         1                C        0
## 41     1400     NA      NA         1                B        0
## 42      224     NA      10         1                B        0
## 43      667     NA      20         1                A        0
## 44      689     NA      19         1                A        0
## 45      429     NA      15         1                A        0
## 46      689     NA      NA         1                A        0
## 47      490     NA      25         1                A        0
##  [ reached getOption("max.print") -- omitted 21 rows ]
# Arrange dtc so that cancellation reasons are grouped
arrange(dtc, CancellationCode)
##    Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier FlightNum
## 1  2011     1         20         4    1413      NA            UA       552
## 2  2011     1          7         5    2028      NA            XE      3050
## 3  2011     2          4         5    1638      NA            AA      1121
## 4  2011     2          8         2    1057      NA            CO       408
## 5  2011     2          1         2    1508      NA            OO      5812
## 6  2011     2         21         1    2257      NA            OO      1111
## 7  2011     2          9         3     555      NA            MQ      3265
## 8  2011     3         18         5     727      NA            UA       109
## 9  2011     4          4         1    1632      NA            DL         8
## 10 2011     4          8         5    1608      NA            WN         4
## 11 2011     4         21         4     953      NA            WN      3840
## 12 2011     4          4         1    1854      NA            XE      2748
## 13 2011     4         11         1    1400      NA            XE      2769
## 14 2011     4          7         4    2118      NA            OO      5819
## 15 2011     4         30         6     612      NA            EV      5386
## 16 2011     4         10         7    1147      NA            EV      5402
## 17 2011     5          9         1     715      NA            OO      1177
## 18 2011     5         16         1    1619      NA            EV      5401
## 19 2011     5         23         1     657      NA            EV      5445
## 20 2011     6         26         7    1629      NA            WN        42
## 21 2011     6         20         1    1037      NA            OO      5817
## 22 2011     6         11         6    1649      NA            FL      1595
## 23 2011     7         29         5    1424      NA            CO      1079
## 24 2011     7         21         4    2059      NA            XE      2705
## 25 2011     7         23         6     605      NA            F9       225
## 26 2011     7         17         7    1917      NA            MQ      3717
## 27 2011     7         12         2    1556      NA            UA       993
## 28 2011     7          7         4    1245      NA            US      1170
## 29 2011     8         11         4    2002      NA            OO      5810
## 30 2011     8         18         4    1808      NA            AA      1294
## 31 2011     8         11         4    1320      NA            CO      1669
## 32 2011     9         29         4     723      NA            EV      4882
## 33 2011     9         26         1    1544      NA            EV      5222
## 34 2011    10         12         3     758      NA            WN         8
## 35 2011    10         24         1     631      NA            WN         2
## 36 2011    10         12         3    2022      NA            MQ      3724
## 37 2011    12         25         7    1652      NA            XE      4375
## 38 2011    12         28         3    1827      NA            OO      5244
## 39 2011    12         11         7    1303      NA            UA       399
## 40 2011    12          1         4     541      NA            US       282
## 41 2011     1         26         3    1926      NA            CO       310
## 42 2011     1         26         3    1703      NA            CO       410
## 43 2011     1         11         2    1100      NA            US       944
## 44 2011     1         17         1     831      NA            WN         1
## 45 2011     1         19         3    1811      NA            XE      2376
## 46 2011     1         17         1     916      NA            XE      3068
## 47 2011     2          2         3     802      NA            XE      2189
##    TailNum ActualElapsedTime AirTime ArrDelay DepDelay Origin Dest
## 1   N509UA                NA      NA       NA       -4    IAH  IAD
## 2   N15912                NA      NA       NA       73    IAH  JAX
## 3   N537AA                NA      NA       NA        8    IAH  DFW
## 4   N11641                NA      NA       NA      187    IAH  EWR
## 5   N959SW                NA      NA       NA       28    IAH  ATL
## 6   N778SK                NA      NA       NA       -3    IAH  AUS
## 7   N613MQ                NA      NA       NA       -5    HOU  DFW
## 8   N469UA                NA      NA       NA       -3    IAH  DEN
## 9   N600TR                NA      NA       NA       42    IAH  ATL
## 10  N365SW                NA      NA       NA      548    HOU  DAL
## 11  N455WN                NA      NA       NA        3    HOU  SAT
## 12  N13968                NA      NA       NA      109    IAH  MOB
## 13  N13970                NA      NA       NA        5    IAH  ELP
## 14  N916SW                NA      NA       NA       -2    IAH  MAF
## 15  N844AS                NA      NA       NA       -3    IAH  MEM
## 16  N684BR                NA      NA       NA       -3    IAH  MEM
## 17  N758SK                NA      NA       NA       -5    IAH  DTW
## 18  N856AS                NA      NA       NA       -1    IAH  MEM
## 19  N606LR                NA      NA       NA       -3    HOU  ATL
## 20  N640SW                NA      NA       NA       -1    HOU  DAL
## 21  N443SW                NA      NA       NA       -3    IAH  CLT
## 22  N946AT                NA      NA       NA       64    HOU  BKG
## 23  N14628                NA      NA       NA       -6    IAH  ORD
## 24  N15926                NA      NA       NA      271    IAH  GPT
## 25  N912FR                NA      NA       NA      -10    HOU  DEN
## 26  N503MQ                NA      NA       NA       -3    IAH  ORD
## 27  N808UA                NA      NA       NA      110    IAH  IAD
## 28  N456UW                NA      NA       NA      153    IAH  CLT
## 29  N978SW                NA      NA       NA       37    IAH  MAF
## 30  N3FLAA                NA      NA       NA        3    IAH  MIA
## 31  N73275                NA      NA       NA        0    IAH  MIA
## 32  N355CA                NA      NA       NA       -2    IAH  DTW
## 33  N851AS                NA      NA       NA      220    IAH  MEM
## 34  N405WN                NA      NA       NA       -2    HOU  DAL
## 35  N359SW                NA      NA       NA        1    HOU  DAL
## 36  N539MQ                NA      NA       NA       -8    IAH  LAX
## 37  N12946                NA      NA       NA       -3    IAH  MOB
## 38  N912SW                NA      NA       NA       27    IAH  ELP
## 39  N528UA                NA      NA       NA        0    IAH  DEN
## 40  N840AW                NA      NA       NA       -9    IAH  PHX
## 41  N77865                NA      NA       NA       26    IAH  EWR
## 42  N77296                NA      NA       NA        0    IAH  IAD
## 43  N452UW                NA      NA       NA      135    IAH  CLT
## 44  N714CB                NA      NA       NA       -4    HOU  HRL
## 45  N15932                NA      NA       NA        6    IAH  ICT
## 46  N13936                NA      NA       NA       -9    IAH  HRL
## 47  N17928                NA      NA       NA        2    IAH  DAL
##    Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted
## 1      1190     NA      NA         1                A        0
## 2       817     NA      19         1                A        0
## 3       224     NA      19         1                A        0
## 4      1400     NA      NA         1                A        0
## 5       689     NA      19         1                A        0
## 6       140     NA      NA         1                A        0
## 7       247     NA      11         1                A        0
## 8       862     NA      NA         1                A        0
## 9       689     NA      NA         1                A        0
## 10      239     NA      NA         1                A        0
## 11      192     NA       5         1                A        0
## 12      427     NA      NA         1                A        0
## 13      667     NA      26         1                A        0
## 14      429     NA      NA         1                A        0
## 15      469     NA      NA         1                A        0
## 16      469     NA      NA         1                A        0
## 17     1076     NA      17         1                A        0
## 18      469     NA      NA         1                A        0
## 19      696     NA      NA         1                A        0
## 20      239     NA      13         1                A        0
## 21      913     NA      NA         1                A        0
## 22      490     NA      25         1                A        0
## 23      925     NA      13         1                A        0
## 24      376     NA      NA         1                A        0
## 25      883     NA      10         1                A        0
## 26      925     NA      NA         1                A        0
## 27     1190     NA      NA         1                A        0
## 28      912     NA      15         1                A        0
## 29      429     NA      15         1                A        0
## 30      964     NA      NA         1                A        0
## 31      964     NA      NA         1                A        0
## 32     1075     NA      NA         1                A        0
## 33      468     NA      NA         1                A        0
## 34      239     NA      NA         1                A        0
## 35      239     NA       6         1                A        0
## 36     1379     NA      NA         1                A        0
## 37      427     NA      NA         1                A        0
## 38      667     NA      20         1                A        0
## 39      862     NA      NA         1                A        0
## 40     1009     NA      NA         1                A        0
## 41     1400     NA      NA         1                B        0
## 42     1190     NA      13         1                B        0
## 43      913     NA      NA         1                B        0
## 44      276     NA       8         1                B        0
## 45      542     NA      NA         1                B        0
## 46      295     NA      NA         1                B        0
## 47      217     NA      NA         1                B        0
##  [ reached getOption("max.print") -- omitted 21 rows ]
# Arrange dtc according to carrier and departure delays
arrange(dtc, UniqueCarrier, DepDelay)
##    Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier FlightNum
## 1  2011     8         18         4    1808      NA            AA      1294
## 2  2011     2          4         5    1638      NA            AA      1121
## 3  2011     7         29         5    1424      NA            CO      1079
## 4  2011     1         26         3    1703      NA            CO       410
## 5  2011     8         11         4    1320      NA            CO      1669
## 6  2011     7         25         1    1654      NA            CO      1422
## 7  2011     1         26         3    1926      NA            CO       310
## 8  2011     3         31         4    1016      NA            CO       586
## 9  2011     2          8         2    1057      NA            CO       408
## 10 2011     4          4         1    1632      NA            DL         8
## 11 2011     6         15         3    1740      NA            DL         8
## 12 2011     4         30         6     612      NA            EV      5386
## 13 2011     4         10         7    1147      NA            EV      5402
## 14 2011     5         23         1     657      NA            EV      5445
## 15 2011     9         29         4     723      NA            EV      4882
## 16 2011     5         16         1    1619      NA            EV      5401
## 17 2011     9         26         1    1544      NA            EV      5222
## 18 2011     7         23         6     605      NA            F9       225
## 19 2011     6         11         6    1649      NA            FL      1595
## 20 2011    10         12         3    2022      NA            MQ      3724
## 21 2011     2          9         3     555      NA            MQ      3265
## 22 2011     7         17         7    1917      NA            MQ      3717
## 23 2011     9         29         4    1639      NA            OO      2062
## 24 2011     5          9         1     715      NA            OO      1177
## 25 2011     2         21         1    2257      NA            OO      1111
## 26 2011     6         20         1    1037      NA            OO      5817
## 27 2011     4          7         4    2118      NA            OO      5819
## 28 2011     8         18         4    1809      NA            OO      2028
## 29 2011     9         16         5    1807      NA            OO      2009
## 30 2011    12         28         3    1827      NA            OO      5244
## 31 2011     2          1         2    1508      NA            OO      5812
## 32 2011     8         11         4    2002      NA            OO      5810
## 33 2011     9         16         5    1234      NA            OO      2059
## 34 2011    12         22         4    1633      NA            OO      5159
## 35 2011     1         20         4    1413      NA            UA       552
## 36 2011     3         18         5     727      NA            UA       109
## 37 2011    12         11         7    1303      NA            UA       399
## 38 2011     7         12         2    1556      NA            UA       993
## 39 2011    12          1         4     541      NA            US       282
## 40 2011     1         11         2    1100      NA            US       944
## 41 2011     7          7         4    1245      NA            US      1170
## 42 2011     1         17         1     831      NA            WN         1
## 43 2011    10         12         3     758      NA            WN         8
## 44 2011     6         26         7    1629      NA            WN        42
## 45 2011    10         24         1     631      NA            WN         2
## 46 2011     4         21         4     953      NA            WN      3840
## 47 2011     4          8         5    1608      NA            WN         4
##    TailNum ActualElapsedTime AirTime ArrDelay DepDelay Origin Dest
## 1   N3FLAA                NA      NA       NA        3    IAH  MIA
## 2   N537AA                NA      NA       NA        8    IAH  DFW
## 3   N14628                NA      NA       NA       -6    IAH  ORD
## 4   N77296                NA      NA       NA        0    IAH  IAD
## 5   N73275                NA      NA       NA        0    IAH  MIA
## 6   N58606                NA      NA       NA       24    IAH  ATL
## 7   N77865                NA      NA       NA       26    IAH  EWR
## 8   N19136                NA      NA       NA      156    IAH  MCO
## 9   N11641                NA      NA       NA      187    IAH  EWR
## 10  N600TR                NA      NA       NA       42    IAH  ATL
## 11  N704DK                NA      NA       NA      110    IAH  ATL
## 12  N844AS                NA      NA       NA       -3    IAH  MEM
## 13  N684BR                NA      NA       NA       -3    IAH  MEM
## 14  N606LR                NA      NA       NA       -3    HOU  ATL
## 15  N355CA                NA      NA       NA       -2    IAH  DTW
## 16  N856AS                NA      NA       NA       -1    IAH  MEM
## 17  N851AS                NA      NA       NA      220    IAH  MEM
## 18  N912FR                NA      NA       NA      -10    HOU  DEN
## 19  N946AT                NA      NA       NA       64    HOU  BKG
## 20  N539MQ                NA      NA       NA       -8    IAH  LAX
## 21  N613MQ                NA      NA       NA       -5    HOU  DFW
## 22  N503MQ                NA      NA       NA       -3    IAH  ORD
## 23  N724SK                NA      NA       NA       -6    IAH  ATL
## 24  N758SK                NA      NA       NA       -5    IAH  DTW
## 25  N778SK                NA      NA       NA       -3    IAH  AUS
## 26  N443SW                NA      NA       NA       -3    IAH  CLT
## 27  N916SW                NA      NA       NA       -2    IAH  MAF
## 28  N758SK                NA      NA       NA       -1    IAH  OMA
## 29  N794SK                NA      NA       NA       27    IAH  DFW
## 30  N912SW                NA      NA       NA       27    IAH  ELP
## 31  N959SW                NA      NA       NA       28    IAH  ATL
## 32  N978SW                NA      NA       NA       37    IAH  MAF
## 33  N752SK                NA      NA       NA       64    IAH  ELP
## 34  N724SK                NA      NA       NA      103    IAH  ATL
## 35  N509UA                NA      NA       NA       -4    IAH  IAD
## 36  N469UA                NA      NA       NA       -3    IAH  DEN
## 37  N528UA                NA      NA       NA        0    IAH  DEN
## 38  N808UA                NA      NA       NA      110    IAH  IAD
## 39  N840AW                NA      NA       NA       -9    IAH  PHX
## 40  N452UW                NA      NA       NA      135    IAH  CLT
## 41  N456UW                NA      NA       NA      153    IAH  CLT
## 42  N714CB                NA      NA       NA       -4    HOU  HRL
## 43  N405WN                NA      NA       NA       -2    HOU  DAL
## 44  N640SW                NA      NA       NA       -1    HOU  DAL
## 45  N359SW                NA      NA       NA        1    HOU  DAL
## 46  N455WN                NA      NA       NA        3    HOU  SAT
## 47  N365SW                NA      NA       NA      548    HOU  DAL
##    Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted
## 1       964     NA      NA         1                A        0
## 2       224     NA      19         1                A        0
## 3       925     NA      13         1                A        0
## 4      1190     NA      13         1                B        0
## 5       964     NA      NA         1                A        0
## 6       689     NA      NA         1                C        0
## 7      1400     NA      NA         1                B        0
## 8       853     NA      NA         1                B        0
## 9      1400     NA      NA         1                A        0
## 10      689     NA      NA         1                A        0
## 11      689     NA      NA         1                B        0
## 12      469     NA      NA         1                A        0
## 13      469     NA      NA         1                A        0
## 14      696     NA      NA         1                A        0
## 15     1075     NA      NA         1                A        0
## 16      469     NA      NA         1                A        0
## 17      468     NA      NA         1                A        0
## 18      883     NA      10         1                A        0
## 19      490     NA      25         1                A        0
## 20     1379     NA      NA         1                A        0
## 21      247     NA      11         1                A        0
## 22      925     NA      NA         1                A        0
## 23      689     NA      NA         1                B        0
## 24     1076     NA      17         1                A        0
## 25      140     NA      NA         1                A        0
## 26      913     NA      NA         1                A        0
## 27      429     NA      NA         1                A        0
## 28      781     NA      32         1                B        0
## 29      224     NA      10         1                B        0
## 30      667     NA      20         1                A        0
## 31      689     NA      19         1                A        0
## 32      429     NA      15         1                A        0
## 33      667     NA      12         1                B        0
## 34      689     NA      NA         1                B        0
## 35     1190     NA      NA         1                A        0
## 36      862     NA      NA         1                A        0
## 37      862     NA      NA         1                A        0
## 38     1190     NA      NA         1                A        0
## 39     1009     NA      NA         1                A        0
## 40      913     NA      NA         1                B        0
## 41      912     NA      15         1                A        0
## 42      276     NA       8         1                B        0
## 43      239     NA      NA         1                A        0
## 44      239     NA      13         1                A        0
## 45      239     NA       6         1                A        0
## 46      192     NA       5         1                A        0
## 47      239     NA      NA         1                A        0
##  [ reached getOption("max.print") -- omitted 21 rows ]

Se puede revertir el orden con desc() aplicado a la columna específica que queremos esté ordenada a la inversa.

# Arrange according to carrier and decreasing departure delays
arrange(hflights, UniqueCarrier, desc(DepDelay))
##        Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier
## 1      2011    12         12         1     650     808            AA
## 2      2011    11         19         6    1752    1910            AA
## 3      2011    12         22         4    1728    1848            AA
## 4      2011    10         23         7    2305       2            AA
## 5      2011     9         27         2    1206    1300            AA
## 6      2011     3         17         4    1647    1747            AA
## 7      2011     6         21         2     955    1315            AA
## 8      2011     5         20         5    2359     130            AA
## 9      2011     4         19         2    2023    2142            AA
## 10     2011     5         12         4    2133      53            AA
## 11     2011    10          9         7    1805    1930            AA
## 12     2011     4         14         4    2117      21            AA
## 13     2011     6         24         5    2137      56            AA
## 14     2011     8         19         5    1340    1716            AA
## 15     2011     6         25         6     906    1229            AA
## 16     2011     4         24         7    2304       3            AA
## 17     2011     2          4         5    2049    2353            AA
## 18     2011    12         16         5    2025    2347            AA
## 19     2011     5         25         3    1918    2039            AA
## 20     2011     8         24         3    1432    1540            AA
## 21     2011     4          4         1    1135    1323            AA
## 22     2011    10         17         1    1250    1622            AA
## 23     2011     8          7         7    2239    2331            AA
## 24     2011     5         28         6    1630    1725            AA
## 25     2011     7         22         5    1139    1247            AA
## 26     2011     4         17         7    2008    2309            AA
## 27     2011    11         18         5    1954    2310            AA
## 28     2011     4         28         4    1843    1949            AA
## 29     2011    10         24         1    1356    1455            AA
## 30     2011     6         13         1    2014    2328            AA
## 31     2011     1          9         7    1835    1951            AA
## 32     2011     4         15         5    2209    2305            AA
## 33     2011     2         17         4     912    1029            AA
## 34     2011     3          3         4    1054    1157            AA
## 35     2011    10         20         4    1924    2242            AA
## 36     2011     9         15         4    1922    2242            AA
## 37     2011     5         12         4    2158    2301            AA
## 38     2011     9          2         5    1613    1730            AA
## 39     2011    11         15         2    1742    1858            AA
## 40     2011     5         20         5    1551    1819            AA
## 41     2011     7         18         1    1956    2315            AA
## 42     2011     3         24         4    1359    1454            AA
## 43     2011    12         17         6    1805    1912            AA
## 44     2011    10          9         7    1909    2231            AA
## 45     2011    12         12         1    1339    1436            AA
## 46     2011     2          2         3    1813    1936            AA
## 47     2011     4         25         1    1543    1803            AA
##        FlightNum TailNum ActualElapsedTime AirTime ArrDelay DepDelay
## 1           1740  N473AA                78      49      978      970
## 2           1903  N495AA                78      40      685      677
## 3           1903  N580AA                80      40      663      653
## 4            742  N548AA                57      39      507      525
## 5           1948  N4YUAA                54      37      265      286
## 6           1505  N584AA                60      41      262      277
## 7            466  N3FTAA               140     120      230      235
## 8            426  N565AA                91      70      255      234
## 9           1925  N467AA                79      50      242      233
## 10          1294  N3AYAA               140     121      223      228
## 11           742  N555AA                85      45      235      225
## 12          1294  N3ENAA               124     107      191      212
## 13          1294  N3GCAA               139     119      191      212
## 14          1700  N3FLAA               156     123      196      195
## 15           466  N3BSAA               143     123      184      186
## 16           426  N594AA                59      47      168      179
## 17          1294  N3CXAA               124     108      158      174
## 18          1294  N3ECAA               142     112      167      170
## 19          1925  N526AA                81      54      179      168
## 20          1848  N535AA                68      43      160      167
## 21           493  N426AA               108      55      193      160
## 22          1946  N3DRAA               152     116      162      155
## 23           426  N558AA                52      37      126      154
## 24          1566  N536AA                55      41      135      150
## 25          1995  N578AA                68      41      137      144
## 26          1294  N3DEAA               121     107      119      143
## 27          1294  N3DNAA               136     114      130      139
## 28          1925  N528AA                66      49      129      133
## 29          1848  N482AA                59      42      115      131
## 30          1294  N3FBAA               134     114      103      129
## 31          1121  N574AA                76      50      126      125
## 32           426  N4XNAA                56      43      110      124
## 33          1225  N439AA                77      43      129      122
## 34           493  N4WWAA                63      45      107      119
## 35          1294  N3DVAA               138     114      112      119
## 36          1294  N3EWAA               140     118      112      117
## 37           426  N514AA                63      40      106      113
## 38           742  N4WUAA                77      42      115      113
## 39           458  N577AA                76      41      113      112
## 40          1566  N579AA               148      56      189      111
## 41          1294  N3BYAA               139     120       90      111
## 42          1505  N508AA                55      40       89      109
## 43          1033  N4YJAA                67      44      102      105
## 44          1294  N3BVAA               142     120      101      104
## 45           865  N4XDAA                57      37       91      104
## 46          1121  N537AA                83      41      111      103
## 47          1566  N4YBAA               140      52      173      103
##        Origin Dest Distance TaxiIn TaxiOut Cancelled CancellationCode
## 1         IAH  DFW      224     14      15         0                 
## 2         IAH  DFW      224      7      31         0                 
## 3         IAH  DFW      224      8      32         0                 
## 4         IAH  DFW      224      5      13         0                 
## 5         IAH  DFW      224     10       7         0                 
## 6         IAH  DFW      224      7      12         0                 
## 7         IAH  MIA      964      9      11         0                 
## 8         IAH  DFW      224      8      13         0                 
## 9         IAH  DFW      224     11      18         0                 
## 10        IAH  MIA      964      5      14         0                 
## 11        IAH  DFW      224     22      18         0                 
## 12        IAH  MIA      964      5      12         0                 
## 13        IAH  MIA      964      3      17         0                 
## 14        IAH  MIA      964     21      12         0                 
## 15        IAH  MIA      964      8      12         0                 
## 16        IAH  DFW      224      5       7         0                 
## 17        IAH  MIA      964      7       9         0                 
## 18        IAH  MIA      964      3      27         0                 
## 19        IAH  DFW      224      5      22         0                 
## 20        IAH  DFW      224      4      21         0                 
## 21        IAH  DFW      224      9      44         0                 
## 22        IAH  MIA      964     12      24         0                 
## 23        IAH  DFW      224      2      13         0                 
## 24        IAH  DFW      224      5       9         0                 
## 25        IAH  DFW      224      5      22         0                 
## 26        IAH  MIA      964      4      10         0                 
## 27        IAH  MIA      964     13       9         0                 
## 28        IAH  DFW      224      5      12         0                 
## 29        IAH  DFW      224      5      12         0                 
## 30        IAH  MIA      964      7      13         0                 
## 31        IAH  DFW      224      9      17         0                 
## 32        IAH  DFW      224      5       8         0                 
## 33        IAH  DFW      224      8      26         0                 
## 34        IAH  DFW      224      8      10         0                 
## 35        IAH  MIA      964      5      19         0                 
## 36        IAH  MIA      964      9      13         0                 
## 37        IAH  DFW      224      4      19         0                 
## 38        IAH  DFW      224     14      21         0                 
## 39        IAH  DFW      224      9      26         0                 
## 40        IAH  DFW      224      7      85         0                 
## 41        IAH  MIA      964      5      14         0                 
## 42        IAH  DFW      224      7       8         0                 
## 43        IAH  DFW      224      6      17         0                 
## 44        IAH  MIA      964     10      12         0                 
## 45        IAH  DFW      224     10      10         0                 
## 46        IAH  DFW      224      9      33         0                 
## 47        IAH  DFW      224      7      81         0                 
##        Diverted
## 1             0
## 2             0
## 3             0
## 4             0
## 5             0
## 6             0
## 7             0
## 8             0
## 9             0
## 10            0
## 11            0
## 12            0
## 13            0
## 14            0
## 15            0
## 16            0
## 17            0
## 18            0
## 19            0
## 20            0
## 21            0
## 22            0
## 23            0
## 24            0
## 25            0
## 26            0
## 27            0
## 28            0
## 29            0
## 30            0
## 31            0
## 32            0
## 33            0
## 34            0
## 35            0
## 36            0
## 37            0
## 38            0
## 39            0
## 40            0
## 41            0
## 42            0
## 43            0
## 44            0
## 45            0
## 46            0
## 47            0
##  [ reached getOption("max.print") -- omitted 227449 rows ]
# Arrange flights by total delay (normal order).
arrange(hflights, DepDelay + ArrDelay)
##        Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier
## 1      2011     7          3         7    1914    2039            XE
## 2      2011     8         31         3     934    1039            OO
## 3      2011     8         21         7     935    1039            OO
## 4      2011     8         28         7    2059    2206            OO
## 5      2011     8         29         1     935    1041            OO
## 6      2011    12         25         7     741     926            OO
## 7      2011     1         30         7     620     812            OO
## 8      2011     8          3         3    1741    1810            XE
## 9      2011     8          4         4     930    1041            OO
## 10     2011     8         18         4     939    1043            OO
## 11     2011     8         26         5    2107    2205            OO
## 12     2011     9         11         7    1451    1857            B6
## 13     2011    12         24         6    1112    1314            OO
## 14     2011     9         13         2    1838    1933            XE
## 15     2011    12         24         6    2129    2337            CO
## 16     2011     8         16         2     928    1207            CO
## 17     2011     5         29         7     627     812            OO
## 18     2011     9         14         3    1840    1933            XE
## 19     2011     1         19         3     754     837            XE
## 20     2011     8         30         2     938    1049            OO
## 21     2011    11         24         4    1720    2022            AA
## 22     2011    11         24         4    2100    2312            CO
## 23     2011     8         27         6     938    1040            OO
## 24     2011    12          4         7     946    1056            XE
## 25     2011     6         11         6    1753    2106            AA
## 26     2011     7         17         7    2059    2210            OO
## 27     2011     8         20         6     936    1043            OO
## 28     2011    12         24         6    1015    1435            CO
## 29     2011    12         24         6    2051      30            CO
## 30     2011    12          3         6    1913    2106            UA
## 31     2011     1         13         4     552     713            US
## 32     2011     7         16         6     935    1045            OO
## 33     2011     7         14         4     933    1047            OO
## 34     2011     8          1         1     933    1047            OO
## 35     2011     8         19         5    2104    2216            OO
## 36     2011     9         28         3     739     833            XE
## 37     2011     9          7         3     913    1006            XE
## 38     2011    12         24         6    1431    1613            CO
## 39     2011    12          1         4    1836    2027            OO
## 40     2011     2          6         7    1529    1917            B6
## 41     2011     2         23         3    1024    1440            CO
## 42     2011     6         28         2     932    1049            OO
## 43     2011     7          9         6     932    1049            OO
## 44     2011     7          3         7     934    1047            OO
## 45     2011     7         28         4    2104    2207            OO
## 46     2011     8         28         7     942    1049            OO
## 47     2011     8         25         4    2102    2219            OO
##        FlightNum TailNum ActualElapsedTime AirTime ArrDelay DepDelay
## 1           2804  N12157                85      66      -70       -1
## 2           2040  N783SK               185     172      -56      -11
## 3           2001  N767SK               184     171      -56      -10
## 4           2003  N783SK               187     171      -54      -11
## 5           2040  N767SK               186     169      -54      -10
## 6           4591  N814SK               165     147      -57       -4
## 7           4461  N804SK               172     156      -49      -10
## 8           2603  N11107                89      73      -40      -19
## 9           1171  N715SK               191     177      -49      -10
## 10          2001  N783SK               184     172      -52       -6
## 11          2003  N713SK               178     163      -55       -3
## 12           622  N658JB               186     161      -44      -14
## 13          5440  N728SK               182     131      -25      -33
## 14          2376  N15932                55      42      -44      -12
## 15          1552  N37437               248     234      -55       -1
## 16             1  N69059               459     441      -48       -7
## 17          4484  N822SK               165     153      -46       -8
## 18          4645  N14933                53      40      -44      -10
## 19          2001  N16951                43      32      -42      -11
## 20          2040  N779SK               191     177      -46       -7
## 21          1294  N3GVAA               122     111      -38      -15
## 22          1616  N27205               132     117      -38      -15
## 23          2001  N744SK               182     162      -50       -2
## 24          6105  N23139               130     112      -41      -11
## 25          1294  N3BUAA               133     118      -39      -12
## 26          1172  N779SK               191     170      -45       -6
## 27          2001  N776SK               187     170      -47       -4
## 28          1134  N16646               200     172      -41      -10
## 29          1452  N38424               159     143      -47       -4
## 30           473  N460UA               233     207      -43       -8
## 31           270  N334AW               141     128      -42       -8
## 32          1171  N779SK               190     173      -45       -5
## 33          1171  N710SK               194     177      -43       -7
## 34          1171  N754SK               194     171      -43       -7
## 35          2003  N767SK               192     177      -44       -6
## 36          2145  N12934                54      36      -34      -16
## 37          2231  N14943                53      32      -33      -17
## 38          1737  N73860               222     204      -51        1
## 39          4455  N340CA               171     156      -41       -9
## 40           624  N531JB               168     154      -43       -6
## 41           432  N19623               196     178      -38      -11
## 42          1171  N742SK               197     177      -41       -8
## 43          1171  N756SK               197     173      -41       -8
## 44          1171  N719SK               193     174      -43       -6
## 45          1172  N745SK               183     170      -48       -1
## 46          2001  N767SK               187     171      -46       -3
## 47          2003  N772SK               197     179      -41       -8
##        Origin Dest Distance TaxiIn TaxiOut Cancelled CancellationCode
## 1         IAH  MEM      468      4      15         0                 
## 2         IAH  BFL     1428      3      10         0                 
## 3         IAH  BFL     1428      3      10         0                 
## 4         IAH  BFL     1428      5      11         0                 
## 5         IAH  BFL     1428      4      13         0                 
## 6         IAH  SLC     1195      4      14         0                 
## 7         IAH  SLC     1195      5      11         0                 
## 8         IAH  HOB      501      5      11         0                 
## 9         IAH  BFL     1428      4      10         0                 
## 10        IAH  BFL     1428      4       8         0                 
## 11        IAH  BFL     1428      5      10         0                 
## 12        HOU  JFK     1428      6      19         0                 
## 13        IAH  ASE      913      3      48         0                 
## 14        IAH  MLU      262      4       9         0                 
## 15        IAH  SEA     1874      5       9         0                 
## 16        IAH  HNL     3904      7      11         0                 
## 17        IAH  SLC     1195      4       8         0                 
## 18        IAH  MLU      262      4       9         0                 
## 19        IAH  LFT      201      4       7         0                 
## 20        IAH  BFL     1428      3      11         0                 
## 21        IAH  MIA      964      4       7         0                 
## 22        IAH  ORD      925      4      11         0                 
## 23        IAH  BFL     1428      4      16         0                 
## 24        IAH  DEN      862      8      10         0                 
## 25        IAH  MIA      964      4      11         0                 
## 26        IAH  BFL     1428      5      16         0                 
## 27        IAH  BFL     1428      4      13         0                 
## 28        IAH  BOS     1597      9      19         0                 
## 29        IAH  EWR     1400      9       7         0                 
## 30        IAH  SFO     1635     10      16         0                 
## 31        IAH  PHX     1009      3      10         0                 
## 32        IAH  BFL     1428      4      13         0                 
## 33        IAH  BFL     1428      5      12         0                 
## 34        IAH  BFL     1428      5      18         0                 
## 35        IAH  BFL     1428      4      11         0                 
## 36        IAH  LFT      201      5      13         0                 
## 37        IAH  SHV      192      5      16         0                 
## 38        IAH  SFO     1635      7      11         0                 
## 39        IAH  SLC     1195      5      10         0                 
## 40        HOU  JFK     1428      5       9         0                 
## 41        IAH  LGA     1416      5      13         0                 
## 42        IAH  BFL     1428      5      15         0                 
## 43        IAH  BFL     1428      4      20         0                 
## 44        IAH  BFL     1428      4      15         0                 
## 45        IAH  BFL     1428      3      10         0                 
## 46        IAH  BFL     1428      4      12         0                 
## 47        IAH  BFL     1428      6      12         0                 
##        Diverted
## 1             0
## 2             0
## 3             0
## 4             0
## 5             0
## 6             0
## 7             0
## 8             0
## 9             0
## 10            0
## 11            0
## 12            0
## 13            0
## 14            0
## 15            0
## 16            0
## 17            0
## 18            0
## 19            0
## 20            0
## 21            0
## 22            0
## 23            0
## 24            0
## 25            0
## 26            0
## 27            0
## 28            0
## 29            0
## 30            0
## 31            0
## 32            0
## 33            0
## 34            0
## 35            0
## 36            0
## 37            0
## 38            0
## 39            0
## 40            0
## 41            0
## 42            0
## 43            0
## 44            0
## 45            0
## 46            0
## 47            0
##  [ reached getOption("max.print") -- omitted 227449 rows ]

7.2.5 summarize()

La función summarize() nos permite obtener un nuevo data frame con el resumen de nuestros datos. Su funcionamiento sería similar a mutate(), donde añadimos el nombre del conjunto de datos, seguido de las funciones que queremos, como la media, desviación estándar, mínimo, etc. La diferencia con esta última función es que el resultado que devuelve debe ser un único número.

summarize(hflights, Media_distancia = mean(Distance))
##   Media_distancia
## 1        787.7832
# Print out a summary with variables min_dist and max_dist
summarize(hflights, min_dist = min(Distance), max_dist = max(Distance))
##   min_dist max_dist
## 1       79     3904
# Print out a summary with variable max_div
summarize(filter(hflights, Diverted == 1), max_div = max(Distance))
##   max_div
## 1    3904
# Remove rows that have NA ArrDelay: temp1
temp1 <- filter(hflights, !is.na(ArrDelay))

# Generate summary about ArrDelay column of temp1
summarize(temp1, earliest = min(ArrDelay), average = mean(ArrDelay), latest = max(ArrDelay), sd = sd(ArrDelay))
##   earliest  average latest       sd
## 1      -70 7.094334    978 30.70852
# Keep rows that have no NA TaxiIn and no NA TaxiOut: temp2
temp2 <- filter(hflights, !is.na(TaxiIn) & !is.na(TaxiOut))

# Print the maximum taxiing difference of temp2 with summarize()
summarize(temp2, max_taxi_diff = max(abs(TaxiIn - TaxiOut)))
##   max_taxi_diff
## 1           160

Aparte de las funciones como min(), max(), mean(), median(), quantile(), sd(), var(), IQR(), todas incluidas en los paquetes base de R, dplyr incorpora otras funciones con este propósito que pueden sernos útiles:

  • first(x) - The first element of vector x.
  • last(x) - The last element of vector x.
  • nth(x, n) - The nth element of vector x.
  • n() - The number of rows in the data.frame or group of observations that summarize() describes.
  • n_distinct(x) - The number of unique values in vector x.
# Print out a summary of hflights with the following variables:
# n_obs: the total number of observations,
# n_carrier: the total number of carriers,
# n_dest: the total number of destinations,
# Generate summarizing statistics for hflights
summarize(hflights,
          n_obs = n(),
          n_carrier = n_distinct(UniqueCarrier),
          n_dest = n_distinct(Dest))
##    n_obs n_carrier n_dest
## 1 227496        15    116
# aa, a tbl with all flights flown by American Airlines, is already available.
# Print out a summary of aa with the following variables:
# n_flights: the total number of flights (each observation is a flight),
# n_canc: the total number of cancelled flights,
# avg_delay: the average arrival delay of flights whose delay is not NA (na.rm = TRUE).
# All American Airline flights
aa <- filter(hflights, UniqueCarrier == "American")

# Generate summarizing statistics for aa 
summarize(aa, n_flights = n(), 
          n_canc = sum(Cancelled == 1), 
          avg_delay = mean(ArrDelay, na.rm = TRUE))
##   n_flights n_canc avg_delay
## 1         0      0       NaN

7.2.6 Pipe (%>%)

El uso de los 5 verbos es muy natural y directo. Sin embargo, cuando queremos emplear varios de ellos en nuestro data frame, aunque útiles, puede convertirse en una cadena muy larga texto algo redundante y difícil de leer.

Para estas opciones de encadenamiento de funciones, se inventó el operador pipe (%>%) del paquete magritt. Sin embargo, no hace falta cargar ese paquete porque dplyr lo tiene implementado. De este modo, la estructura y lectura del código es mucho más natural, sencilla y óptima.

Las características de pipe es que primero añadimos el nombre del dataframe seguido del operador pipe, y a continuación las funciones encadanedas. Lo interesante del pipe es que no hace falta incluir en cada una de las funciones el nombre del dataframe, que coge del inicio del pipe. Básicamente lo que hace es rellenar automáticamente el primer argumento de las funciones con el nombre del dataframe.

Su lectura sería de este modo: a este dataframe, le haces primero esto, luego esto otro, y luego esto. Es un modo de encadenar funciones.

# Take the hflights data set and then ...
# Add a variable named diff that is the result of subtracting TaxiIn from TaxiOut, and then ...
# Pick all of the rows whose diff value does not equal NA, and then ...
# Summarize the data set with a value named avg that is the mean diff value.

hflights %>% 
  mutate(diff = TaxiOut - TaxiIn) %>%
  filter(!is.na(diff)) %>%
  summarize(avg = mean(diff))
##        avg
## 1 8.992064

mutate() the hflights dataset and add two variables:

RealTime: the actual elapsed time plus 100 minutes (for the overhead that flying involves) and
mph: calculated as 60 times Distance divided by RealTime, then

filter() to keep observations that have an mph that is not NA and that is below 70, finally summarize() the result by creating four summary variables:

n_less, the number of observations,
n_dest, the number of destinations,
min_dist
max_dist
hflights %>%
  mutate(RealTime = ActualElapsedTime + 100,
         mph = 60 * Distance/ RealTime) %>%
  filter(!is.na(mph) & mph < 70) %>%
  summarize(n_less = n(),
            n_dest = n_distinct(Dest),
            min_dist = min(Distance),
            max_dist = max(Distance))
##   n_less n_dest min_dist max_dist
## 1   6726     13       79      305
filter() the result of mutate to:
keep observations that have an mph under 105 or for which Cancelled equals 1 or for which Diverted equals 1.
summarize() the result by creating four summary variables:
n_non, the number of observations,
n_dest, the number of destinations,
min_dist, the minimum distance and
max_dist, the maximum distance.
hflights %>%
  mutate(
    RealTime = ActualElapsedTime + 100, 
    mph = 60 * Distance / RealTime) %>%
    filter(mph < 105 | Cancelled == 1 | Diverted == 1) %>%
    summarize(n_non = n(), 
              n_dest = n_distinct(Dest),
              min_dist = min(Distance),
              max_dist = max(Distance))
##   n_non n_dest min_dist max_dist
## 1 42400    113       79     3904
filter() the hflights tbl to keep only observations whose DepTime is not NA, whose ArrTime is not NA and for which DepTime exceeds ArrTime.
Pipe the result into a summarize() call to create a single summary variable: num, that simply counts the number of observations.
hflights %>% 
  filter(!is.na(DepTime) & !is.na(ArrTime) & DepTime > ArrTime) %>%
  summarize(num = n())
##    num
## 1 2718

7.3 group_by y el trabajo con bases de datos

group_by() es una función interesantísima ya que nos permite aplicar los 5 verbos anteriores dividido por grupos, por ejemplo, las categorías de una variable. Ya no tendríamos la media de toda una variable si no la media de todos los grupos. Nos devolvería un dataframe.

Su combinación con summarize() es fantástica. Su estructura es:

group_by(conjunto_de_datos, variable1, variable2...)
Use group_by() to group hflights by UniqueCarrier.
summarize() the grouped tbl with two summary variables:
    p_canc, the percentage of cancelled flights.
    avg_delay, the average arrival delay of flights whose delay does not equal NA. 
Finally, order the carriers in the summary from low to high by their average arrival delay. Use percentage of flights cancelled to break any ties.
hflights %>%
  group_by(UniqueCarrier) %>%
  summarize(
    p_canc = 100 * mean(Cancelled == 1),
    avg_delay = mean(ArrDelay, na.rm = TRUE)) %>%
  arrange(avg_delay, p_canc)
## # A tibble: 15 x 3
##    UniqueCarrier p_canc avg_delay
##    <chr>          <dbl>     <dbl>
##  1 US             1.13     -0.631
##  2 AA             1.85      0.892
##  3 FL             0.982     1.85 
##  4 AS             0         3.19 
##  5 YV             1.27      4.01 
##  6 DL             1.59      6.08 
##  7 CO             0.678     6.10 
##  8 MQ             2.90      7.15 
##  9 EV             3.45      7.26 
## 10 WN             1.55      7.59 
## 11 F9             0.716     7.67 
## 12 XE             1.55      8.19 
## 13 OO             1.39      8.69 
## 14 B6             2.59      9.86 
## 15 UA             1.64     10.5

Podemos también realizar una combinación de group_by() con mutate().

filter() the hflights tbl to only keep observations for which ArrDelay is not NA and positive.
Use group_by() on the result to group by UniqueCarrier.
Next, use summarize() to calculate the average ArrDelay per carrier. Call this summary variable avg.
Feed the result into a mutate() call: create a new variable, rank, calculated as rank(avg).
Finally, arrange by this new rank variable
hflights %>%
  filter(!is.na(ArrDelay) & ArrDelay > 0) %>%
  group_by(UniqueCarrier) %>%
  summarize(avg = mean(ArrDelay)) %>%
  mutate(rank = rank(avg)) %>%
  arrange(rank)
## # A tibble: 15 x 3
##    UniqueCarrier   avg  rank
##    <chr>         <dbl> <dbl>
##  1 YV             18.7     1
##  2 F9             18.7     2
##  3 US             20.7     3
##  4 CO             22.1     4
##  5 AS             22.9     5
##  6 OO             24.1     6
##  7 XE             24.2     7
##  8 WN             25.3     8
##  9 FL             27.9     9
## 10 AA             28.5    10
## 11 DL             32.1    11
## 12 UA             32.5    12
## 13 MQ             38.8    13
## 14 EV             40.2    14
## 15 B6             45.5    15
# How many airplanes only flew to one destination?
hflights %>%
  group_by(TailNum) %>%
  summarize(ndest = n_distinct(Dest)) %>%
  filter(ndest == 1) %>%
  summarize(nplanes = n())
## # A tibble: 1 x 1
##   nplanes
##     <int>
## 1    1526
# Find the most visited destination for each carrier
hflights %>%
  group_by(UniqueCarrier, Dest) %>%
  summarize(n = n()) %>%
  mutate(rank = rank(desc(n))) %>%
  filter(rank == 1)
## # A tibble: 15 x 4
## # Groups:   UniqueCarrier [15]
##    UniqueCarrier Dest      n  rank
##    <chr>         <chr> <int> <dbl>
##  1 AA            DFW    2105     1
##  2 AS            SEA     365     1
##  3 B6            JFK     695     1
##  4 CO            EWR    3924     1
##  5 DL            ATL    2396     1
##  6 EV            DTW     851     1
##  7 F9            DEN     837     1
##  8 FL            ATL    2029     1
##  9 MQ            DFW    2424     1
## 10 OO            COS    1335     1
## 11 UA            SFO     643     1
## 12 US            CLT    2212     1
## 13 WN            DAL    8243     1
## 14 XE            CRP    3175     1
## 15 YV            CLT      71     1

7.3.1 dplyr y bases de datos

dplyr te permite trabajar con otro tipo de formato de base de datos, no solo con tbl.

library(data.table)
hflights2 <- as.data.table(hflights)

class(hflights) 
## [1] "data.frame"
class(hflights2)
## [1] "data.table" "data.frame"
# Use summarize to calculate n_carrier
summarize(hflights2, n_carrier = n_distinct(UniqueCarrier))
##   n_carrier
## 1        15

DataCamp hosts a mySQL database with data about flights that departed from New York City in 2013. The data is similar to the data in hflights, but it does not contain information about cancellations or diversions. With the tbl() function, we already created a reference to a table in this information.

Although nycflights is a reference to data that lives outside of R, you can use the dplyr commands on them as usual. Behind the scenes, dplyr will convert the commands to the database’s native language (in this case, SQL), and return the results. This allows you to pull data that is too large to fit in R: only the fraction of the data that you need will actually be downloaded into R, which will usually fit into R without memory issues.

Try to understand the code that creates nycflights, a reference to a MySQL table.
Use glimpse() to check out nycflights. Although nycflights is a reference to a tbl in a remote database, there is no difference in syntax. Look carefully: the variable names in nycflights differ from the ones in hflights!
Group nycflights data by carrier, then summarize() with two variables: n_flights, the number of flights flown by each carrier and avg_delay, the average arrival delay of flights flown by each carrier. Finally, arrange the carriers by average delay from low to high.
# Set up a connection to the mysql database
my_db <- src_mysql(dbname = "dplyr", 
                   host = "courses.csrrinzqubik.us-east-1.rds.amazonaws.com", 
                   port = 3306, 
                   user = "student",
                   password = "datacamp")

# Reference a table within that source: nycflights
nycflights <- tbl(my_db, "dplyr")

# glimpse at nycflights
glimpse(nycflights)
## Observations: ??
## Variables: 17
## Database: mysql 5.6.34-log [student@courses.csrrinzqubik.us-east-1.rds.amazonaws.com:/dplyr]
## $ id        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…
## $ year      <int> 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, …
## $ month     <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ day       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ dep_time  <int> 517, 533, 542, 544, 554, 554, 555, 557, 557, 558, 558,…
## $ dep_delay <int> 2, 4, 2, -1, -6, -4, -5, -3, -3, -2, -2, -2, -2, -2, -…
## $ arr_time  <int> 830, 850, 923, 1004, 812, 740, 913, 709, 838, 753, 849…
## $ arr_delay <int> 11, 20, 33, -18, -25, 12, 19, -14, -8, 8, -2, -3, 7, -…
## $ carrier   <chr> "UA", "UA", "AA", "B6", "DL", "UA", "B6", "EV", "B6", …
## $ tailnum   <chr> "N14228", "N24211", "N619AA", "N804JB", "N668DN", "N39…
## $ flight    <int> 1545, 1714, 1141, 725, 461, 1696, 507, 5708, 79, 301, …
## $ origin    <chr> "EWR", "LGA", "JFK", "JFK", "LGA", "EWR", "EWR", "LGA"…
## $ dest      <chr> "IAH", "IAH", "MIA", "BQN", "ATL", "ORD", "FLL", "IAD"…
## $ air_time  <int> 227, 227, 160, 183, 116, 150, 158, 53, 140, 138, 149, …
## $ distance  <int> 1400, 1416, 1089, 1576, 762, 719, 1065, 229, 944, 733,…
## $ hour      <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, …
## $ minute    <int> 17, 33, 42, 44, 54, 54, 55, 57, 57, 58, 58, 58, 58, 58…
# Ordered, grouped summary of nycflights
nycflights %>% 
   group_by(carrier) %>%
   summarize(n_flights = n(), avg_delay = mean(arr_delay)) %>%
   arrange(avg_delay)
## Warning: Missing values are always removed in SQL.
## Use `AVG(x, na.rm = TRUE)` to silence this warning
## Warning in .local(conn, statement, ...): Decimal MySQL column 2 imported as
## numeric
## # Source:     lazy query [?? x 3]
## # Database:   mysql 5.6.34-log
## #   [student@courses.csrrinzqubik.us-east-1.rds.amazonaws.com:/dplyr]
## # Ordered by: avg_delay
##    carrier n_flights avg_delay
##    <chr>       <dbl>     <dbl>
##  1 AS            714    -9.86 
##  2 HA            342    -6.92 
##  3 AA          32729     0.356
##  4 DL          48110     1.63 
##  5 VX           5162     1.75 
##  6 US          20536     2.06 
##  7 UA          58665     3.50 
##  8 9E          18460     6.91 
##  9 B6          54635     9.36 
## 10 WN          12275     9.47 
## # … with more rows