Introduction

Finding the cheapest flight from point A to point B could be a headache for many of us, especially with other multiple constraints, such as duration, layover, departure and arrival time, etc. The goal of the flightscanner package is to provide a simple and straightforward interface for interacting with Rapid API – Skyscanner through R. The Skyscanner API lets users to search for flight and query flight prices from Skyscanner’s database, as well as quotes from ticketing agencies. Besides these basic functionalities as a flight searching tool, flightscanner also allows users to schedule searches and record results antomatically. In addition, this package provides a Shiny APP to visualize the trip on a map and to show the available ticket options according to the customized constraints.

Getting Started with flightscanner

At the time of this writing, flightscanner has not been submitted to CRAN. Right now, the flightscanner package can be easily installed through the devtools package with the function install_github().

devtools::install_github("MinZhang95/flightscanner")
library(flightscanner)

Setup with API key

The first step in using the flightscanner package is to initialized the API connection to Skyscanner. If this is your first time of loading this package, you will be required to pass the API key received from Skyscanner into the console. Two questions will be prompted for users to quickly setup the API:

API key is required!
Please follow the instructions to get the key:
1. Browse and login:  https://rapidapi.com/skyscanner/api/skyscanner-flight-search 
   Do you want to visit this website (1 for YES; 0 for NO)?
   
2. Copy the value of X-RapidAPI-Key in Header Parameters.
   Paste your key (without quote):

By selecting “1” for the first question, the users will be directed to the Rapid API Skyscanner webpage, where the API key can be found in the right panel (Figure 1) and be used for the second question.

Figure 1: Rapid API Skyscanner and API-key

Figure 1: Rapid API Skyscanner and API-key

A welcome message will show up with a valid API key:

Welcome to FlightScanner!

The valid API key will be stored into “APIkey.txt” under the current working directory, so that the API key will not be required again and again everytime when the package is loaded.

However, with an invalid API key, a failure message will show up:

Check your key or network connection. And use function `apiSetKey` to set key later.

Alternatively, the users could set (or reset) the API key manually with the function apiSetKey():

apiSetKey("YOUR KEY")

Please notice that apiSetKey() does not generate or rewrite “APIkey.txt” under the current working directory.

To obtain the global API key, use the function apiGetKey():

This function will return the API key only if it has been successfully setup; otherwise it will return NULL.

Main Functions

Download data with API

apiCreateSession()

apiCreateSession() allows the users to input their flight information (origin, destination and dates) and create a session on the API server. The output contains a session ID. For example, to buy a ticket from Des Moines to Detroit for an adult on 2019-06-01 (the departure date cannot be earlier than the current date):

dsm2dtw_session <- 
  apiCreateSession(origin = "DSM", destination = "DTW", startDate = "2019-06-01", adults = 1)

The output of apiCreateSession() is used as the input of apiPollSession().

apiPollSession()

apiPollSession() retrieves the flight data searched with apiCreateSession() and allows the users to sort and filter the tickets by various standards. The default values of all filter variables are NULL, meaning that we do not filter anything before we obtain the actual data. For example, to search the previous result in price ascending order:

dsm2dtw_res <- apiPollSession(response = dsm2dtw_session, sortType = "price", sortOrder = "asc")

Let’s check the content of the output of apiPollSession():

dsm2dtw_res %>% content %>% names
#>  [1] "SessionKey"  "Query"       "Status"      "Itineraries" "Legs"       
#>  [6] "Segments"    "Carriers"    "Agents"      "Places"      "Currencies"

The output of apiPollSession() is messy, because it contains several sub-lists, such as “itineraries”, “legs”, and “segments”. The relationship between these terms are shown below.

\[ \text{searching result} \begin{cases} \text{itinerary_1} \begin{cases} \text{leg_1} \begin{cases} \text{segment_1} \\ \text{segment_2} \\ \vdots \\ \text{segment_S} \end{cases} \\ \text{leg_2} \begin{cases} \text{segment_1} \end{cases} \end{cases} \\ \text{itinerary_2} \begin{cases} \text{leg_1} \begin{cases} \text{segment_1} \\ \text{segment_2} \end{cases} \\ \text{leg_2} \begin{cases} \text{segment_1} \end{cases} \end{cases} \\ \vdots \\ \text{itinerary_n} \begin{cases} \text{leg_1} \begin{cases} \text{segment_1} \end{cases} \\ \text{leg_2} \begin{cases} \text{segment_1} \end{cases} \end{cases} \end{cases} \]

One searching request may contain several itineraries. A one-way trip contains one leg, whereas a round-way trip contains two: outbound leg and inbound leg. One leg contains several segments if it is not a direct flight.

Data Processing

flightGet()

flightGet() allows users to input the result from PollSession() or to read from database (explain later in “Data Storage” section). The output contains a list of seven dataframes, whose names are printed below:

dsm2dtw_df <- dsm2dtw_res %>% flightGet()
#> Warning: Unmatch of Segments and Stops: [LegId] =
#> 11140-1906010743--30963,-32462-3-11152-1906020959
#> Warning: Unmatch of Segments and Stops: [LegId] =
#> 11140-1906011413--30963-1-11152-1906020959
#> Warning: Unmatch of Segments and Stops: [LegId] =
#> 11140-1906010713--30963-1-11152-1906011729
names(dsm2dtw_df)
#> [1] "price"       "itineraries" "legs"        "segments"    "carriers"   
#> [6] "agents"      "places"

The dataframe “price” provides information, such as the seraching time and pricing options:

dsm2dtw_df$price %>% head(3) %>% print(width = 120)
#> # A tibble: 3 x 4
#>   SearchTime          OutboundLegId                             
#>   <dttm>              <chr>                                     
#> 1 2019-05-09 19:55:01 11140-1906011835--31722-1-11152-1906012344
#> 2 2019-05-09 19:55:01 11140-1906010615--31722-1-11152-1906011349
#> 3 2019-05-09 19:55:01 11140-1906011520--31722-1-11152-1906012026
#>   InboundLegId PricingOptions  
#>   <chr>        <list>          
#> 1 ""           <tibble [1 × 3]>
#> 2 ""           <tibble [1 × 3]>
#> 3 ""           <tibble [1 × 3]>

Within the same itinerary, there might be several different prices due to different agents:

dsm2dtw_df$price$PricingOptions[[39]] %>% print(width = 120)
#> # A tibble: 2 x 3
#>   AgentId Price
#>     <int> <dbl>
#> 1 1960211 4037.
#> 2 3987731 4052.
#>   LinkURL                                                                  
#>   <chr>                                                                    
#> 1 http://partners.api.skyscanner.net/apiservices/deeplink/v2?_cje=NqagL4PV…
#> 2 http://partners.api.skyscanner.net/apiservices/deeplink/v2?_cje=NqagL4PV…

The dataframe “leg” provides information, such as duration and number of stops:

dsm2dtw_df$legs %>% head(3) %>% print(width = 120)
#> # A tibble: 3 x 9
#>   Id                  SegmentIds OriginId DestinationId DepartureTime      
#>   <chr>               <list>        <int>         <int> <dttm>             
#> 1 11140-1906010605--… <chr [3]>     11140         11152 2019-06-01 06:05:00
#> 2 11140-1906011658--… <chr [3]>     11140         11152 2019-06-01 16:58:00
#> 3 11140-1906011705--… <chr [2]>     11140         11152 2019-06-01 17:05:00
#>   ArrivalTime         Duration No.Stops Stops               
#>   <dttm>                 <int>    <int> <list>              
#> 1 2019-06-01 18:14:00      669        2 <data.frame [2 × 2]>
#> 2 2019-06-02 14:20:00     1222        2 <data.frame [2 × 2]>
#> 3 2019-06-02 22:25:00     1700        1 <data.frame [1 × 2]>

We can also check the stop information and the layover in minutes with the “leg” dataframe for each leg:

dsm2dtw_df$legs$Stops %>% head(3) %>% print(width = 120)
#> [[1]]
#>   StopId Layover
#> 1  12389      81
#> 2  15062     191
#> 
#> [[2]]
#>   StopId Layover
#> 1  15062      70
#> 2  16177     525
#> 
#> [[3]]
#>   StopId Layover
#> 1  10959    1416

Similarly, the detailed results about the segments are stored in the “segments” dataframe:

dsm2dtw_df$segments %>% head(2) %>% print(width = 120)
#> # A tibble: 2 x 9
#>   Id         OriginId DestinationId DepartureTime       ArrivalTime        
#>   <chr>         <int>         <int> <dttm>              <dttm>             
#> 1 11140-190…    11140         12389 2019-06-01 06:05:00 2019-06-01 08:41:00
#> 2 12389-190…    12389         15062 2019-06-01 10:02:00 2019-06-01 12:39:00
#>   Duration CarrierId OperatingCarrierId FlightNumber
#>      <int>     <int>              <int> <chr>       
#> 1      156      1793               -676 6115        
#> 2      157      1793               1793 1403

In the above outputs, the carriers and stops are represented with their IDs. To “translate” to their names, run:

dsm2dtw_df$carriers %>% head(1) %>% print(width = 120)
#> # A tibble: 1 x 4
#>      Id Code  Name                            
#>   <int> <chr> <chr>                           
#> 1  -676 ""    Mesa Airlines DBA United Express
#>   ImageURL                                              
#>   <chr>                                                 
#> 1 https://s1.apideeplink.com/images/airlines/default.png
dsm2dtw_df$places %>% head(1) %>% print(width = 120)
#> # A tibble: 1 x 5
#>      Id ParentId Code  Type    Name      
#>   <int>    <int> <chr> <chr>   <chr>     
#> 1 11140     2266 DSM   Airport Des Moines

flightFilter()

flightFilter() allows users to filter the results obtained from flightGet(). Continued with the previous example, the user looks for flights with a budget of $1,000, no more than 1 stop, and departure time after 8AM:

flightFilter(dsm2dtw_df, max_price = 1000, max_stops = 1, out_departure = c("08:00","24:00")) %>% head(3)
#> # A tibble: 3 x 15
#>   OutboundLegId InboundLegId PricingOptions OutboundLegSegm…
#>   <chr>         <chr>        <list>         <list>          
#> 1 11140-190601… ""           <tibble [1 × … <data.frame [2 …
#> 2 11140-190601… ""           <tibble [1 × … <data.frame [2 …
#> 3 11140-190601… ""           <tibble [1 × … <data.frame [2 …
#> # … with 11 more variables: OutboundLegDepartureTime <dttm>,
#> #   OutboundLegArrivalTime <dttm>, OutboundLegDuration <int>,
#> #   OutboundLegNo.Stops <int>, OutboundLegStops <list>,
#> #   InboundLegSegments <list>, InboundLegDepartureTime <dttm>,
#> #   InboundLegArrivalTime <dttm>, InboundLegDuration <int>,
#> #   InboundLegNo.Stops <int>, InboundLegStops <list>

Data Storage

Storing flight data as database can be efficient for automatic searching.

dbCreateDB()

dbCreateDB() is a function to connect to the local database file, default is “flight.db”. This is the pre-configuration before saving data in database.

dbCreateDB(conn = RSQLite::SQLite(), dbname = "flight.db")

The flight.db includes seven tables:

#> [1] "agent"     "carrier"   "itinerary" "leg"       "place"     "price"    
#> [7] "segment"

It will excute:

  1. connect to SQLite driver.
  2. create a local database file if it doesn’t exist.
  3. create the schema of above seven tables if they don’t exist.

dbSaveData

dbSaveDB is a function to save data into the databse file.

resp <- apiCreateSession(origin = "DSM", destination = "DTW", startDate = "2019-06-01")
resp <- apiPollSession(resp)
data <- flightGet(resp)

# Connect to SQLite database
con <- dbCreateDB(dbname = ":memory:")
dbSaveData(resp, con)  # from response
dbSaveData(data, con)  # from list
dbDisconnect(con)

It accepts two classes of inputs: response or list. response is the request response got by apiPollSession(). list is the data got by flightGet().

Automatic Data Download

Important Notice

A feature that makes the flightscanner package unique, compared with the existing flight searching engines, is its functionality of automatic flight enquiry according to a schedule.

This part of functions only works on Unix/Linux/MacOS, not Windows. In the future, we will add Windows part.

If you use MacOS and meet the problem of “Operation not permitted”. Follow the instructions:

  1. Pull down the Apple menu and choose “System Preferences”
  2. Choose “Security & Privacy” control panel
  3. Now select the “Privacy” tab, then from the left-side menu select “Full Disk Access”
  4. Click the lock icon in the lower left corner of the preference panel and authenticate with an admin level login
  5. Now click the [+] plus button to add an application with full disk access
  6. Navigate to the /Applications/ folder and choose “RStudio.app” or “R” to grant it with Full Disk Access privileges
  7. Relaunch RStudio or R, the “Operation not permitted” error messages will be gone

Create and Manage Jobs

Creating Cron jobs is realized with the cron_create() function. Besides the regular flight information (such as orgin, destination, and dates), another input “frequency” is needed for the job schedule. It could be “minutely”, “hourly”, “daily” or other frequencies defined by Cron’s syntax, see link. Here is an example:

cron_create("DSM", "SEA", "2019-07-20", frequency = "hourly")  # this is the example
cron_create("DSM", "PVG", "2019-06-01", frequency = "0 */2 * * *")  # every 2 hours

This function will generate a log file and a database file. All of the scheduled searching results are contained in this database file, e.g. “flight.db”.

# connect to SQLite database
con <- dbCreateDB(dbname = "flight.db")
# read data from database
data <- flightGet(con)  
# show the searching time
unique(data$price$SearchTime)
#> [1] "2019-05-05 09:00:13 CDT" "2019-05-05 10:00:05 CDT"
#> [3] "2019-05-05 13:00:12 CDT" "2019-05-05 14:00:07 CDT"
#> [5] "2019-05-05 15:00:04 CDT" "2019-05-05 16:00:05 CDT"
# disconnect database
dbDisconnect(con)

To show the current searching jobs, run the function:

The job will be automatically excuted even if R is closed or the computer is restarted. To stop the job, run:

cron_clear(ask = FALSE)

Shiny App

To open the Shiny App, run:

shiny::runApp(system.file(package = "flightscanner", "shiny"))

The Shiny App for the flightscanner includes three tabs: Airport Map, Flights and IATA Code.

Airport Map

It is used as the welcome page by default. The map from leaflet shows the accurate locations of the target airports. It could provide a rough intuition about how far the users need to travel. Input values are needed on the top of the map when doing a flight search:

  • Trip type: one-way tirp or round trip.
  • From, To: 3-character code for the departure/destination airport.
  • Dept.Date, Arr.Date: departure date (no earlier than the current date) and return date (no earlier than the departure date). Arr.Date box would appear only when the round trip is selected.

Click on the Go! button after providing the trip information.

Flights

Click on the Flight tab after the search is complete. There are several filter options on the left panel.

  • Price: a slider ranging from the minimum to the maximum of the ticket prices.
  • Airlines Includes, Airlines Excludes: the users can include or exclude some specific airlines.
  • Duration: a slider ranging from the minimum to the maximum of the trip duration.
  • Stops: the users can specify their preferences to the number of stransition stops.
  • Layover: the total time to be spent at the transition stops.
  • Outbound, Inbound: the users can choose a range of time for the departure time and arrival time, for both outbound flight and inbound flight using a 24-hour clock.

A table containing the detailed inforamtion about the filtered filghts will be given on the right main panel, including the ticket price, departure and arrival time for inbound and/or outbound flight, duration, and the number of stops for inbound and/or outbound. There are also hyperlinks to the ticketing agencies in the column of Link.

IATA Code

Under this tab, users can search for the 3-character codes for the target airports by providing city or country names in the searching box in the upper right corner. The data comes from MUCflight.