Main Functions
Download data with API
apiCreateSession()
apiCreateSession() allows the users to input their flight information (origin, destination and dates) and create a session on the API server. The output contains a session ID. For example, to buy a ticket from Des Moines to Detroit for an adult on 2019-06-01 (the departure date cannot be earlier than the current date):
dsm2dtw_session <-
apiCreateSession(origin = "DSM", destination = "DTW", startDate = "2019-06-01", adults = 1)The output of apiCreateSession() is used as the input of apiPollSession().
apiPollSession()
apiPollSession() retrieves the flight data searched with apiCreateSession() and allows the users to sort and filter the tickets by various standards. The default values of all filter variables are NULL, meaning that we do not filter anything before we obtain the actual data. For example, to search the previous result in price ascending order:
dsm2dtw_res <- apiPollSession(response = dsm2dtw_session, sortType = "price", sortOrder = "asc")Let’s check the content of the output of apiPollSession():
dsm2dtw_res %>% content %>% names
#> [1] "SessionKey" "Query" "Status" "Itineraries" "Legs"
#> [6] "Segments" "Carriers" "Agents" "Places" "Currencies"The output of apiPollSession() is messy, because it contains several sub-lists, such as “itineraries”, “legs”, and “segments”. The relationship between these terms are shown below.
\[ \text{searching result} \begin{cases} \text{itinerary_1} \begin{cases} \text{leg_1} \begin{cases} \text{segment_1} \\ \text{segment_2} \\ \vdots \\ \text{segment_S} \end{cases} \\ \text{leg_2} \begin{cases} \text{segment_1} \end{cases} \end{cases} \\ \text{itinerary_2} \begin{cases} \text{leg_1} \begin{cases} \text{segment_1} \\ \text{segment_2} \end{cases} \\ \text{leg_2} \begin{cases} \text{segment_1} \end{cases} \end{cases} \\ \vdots \\ \text{itinerary_n} \begin{cases} \text{leg_1} \begin{cases} \text{segment_1} \end{cases} \\ \text{leg_2} \begin{cases} \text{segment_1} \end{cases} \end{cases} \end{cases} \]
One searching request may contain several itineraries. A one-way trip contains one leg, whereas a round-way trip contains two: outbound leg and inbound leg. One leg contains several segments if it is not a direct flight.
Data Processing
flightGet()
flightGet() allows users to input the result from PollSession() or to read from database (explain later in “Data Storage” section). The output contains a list of seven dataframes, whose names are printed below:
dsm2dtw_df <- dsm2dtw_res %>% flightGet()
#> Warning: Unmatch of Segments and Stops: [LegId] =
#> 11140-1906010743--30963,-32462-3-11152-1906020959
#> Warning: Unmatch of Segments and Stops: [LegId] =
#> 11140-1906011413--30963-1-11152-1906020959
#> Warning: Unmatch of Segments and Stops: [LegId] =
#> 11140-1906010713--30963-1-11152-1906011729
names(dsm2dtw_df)
#> [1] "price" "itineraries" "legs" "segments" "carriers"
#> [6] "agents" "places"The dataframe “price” provides information, such as the seraching time and pricing options:
dsm2dtw_df$price %>% head(3) %>% print(width = 120)
#> # A tibble: 3 x 4
#> SearchTime OutboundLegId
#> <dttm> <chr>
#> 1 2019-05-09 19:55:01 11140-1906011835--31722-1-11152-1906012344
#> 2 2019-05-09 19:55:01 11140-1906010615--31722-1-11152-1906011349
#> 3 2019-05-09 19:55:01 11140-1906011520--31722-1-11152-1906012026
#> InboundLegId PricingOptions
#> <chr> <list>
#> 1 "" <tibble [1 × 3]>
#> 2 "" <tibble [1 × 3]>
#> 3 "" <tibble [1 × 3]>Within the same itinerary, there might be several different prices due to different agents:
dsm2dtw_df$price$PricingOptions[[39]] %>% print(width = 120)
#> # A tibble: 2 x 3
#> AgentId Price
#> <int> <dbl>
#> 1 1960211 4037.
#> 2 3987731 4052.
#> LinkURL
#> <chr>
#> 1 http://partners.api.skyscanner.net/apiservices/deeplink/v2?_cje=NqagL4PV…
#> 2 http://partners.api.skyscanner.net/apiservices/deeplink/v2?_cje=NqagL4PV…The dataframe “leg” provides information, such as duration and number of stops:
dsm2dtw_df$legs %>% head(3) %>% print(width = 120)
#> # A tibble: 3 x 9
#> Id SegmentIds OriginId DestinationId DepartureTime
#> <chr> <list> <int> <int> <dttm>
#> 1 11140-1906010605--… <chr [3]> 11140 11152 2019-06-01 06:05:00
#> 2 11140-1906011658--… <chr [3]> 11140 11152 2019-06-01 16:58:00
#> 3 11140-1906011705--… <chr [2]> 11140 11152 2019-06-01 17:05:00
#> ArrivalTime Duration No.Stops Stops
#> <dttm> <int> <int> <list>
#> 1 2019-06-01 18:14:00 669 2 <data.frame [2 × 2]>
#> 2 2019-06-02 14:20:00 1222 2 <data.frame [2 × 2]>
#> 3 2019-06-02 22:25:00 1700 1 <data.frame [1 × 2]>We can also check the stop information and the layover in minutes with the “leg” dataframe for each leg:
dsm2dtw_df$legs$Stops %>% head(3) %>% print(width = 120)
#> [[1]]
#> StopId Layover
#> 1 12389 81
#> 2 15062 191
#>
#> [[2]]
#> StopId Layover
#> 1 15062 70
#> 2 16177 525
#>
#> [[3]]
#> StopId Layover
#> 1 10959 1416Similarly, the detailed results about the segments are stored in the “segments” dataframe:
dsm2dtw_df$segments %>% head(2) %>% print(width = 120)
#> # A tibble: 2 x 9
#> Id OriginId DestinationId DepartureTime ArrivalTime
#> <chr> <int> <int> <dttm> <dttm>
#> 1 11140-190… 11140 12389 2019-06-01 06:05:00 2019-06-01 08:41:00
#> 2 12389-190… 12389 15062 2019-06-01 10:02:00 2019-06-01 12:39:00
#> Duration CarrierId OperatingCarrierId FlightNumber
#> <int> <int> <int> <chr>
#> 1 156 1793 -676 6115
#> 2 157 1793 1793 1403In the above outputs, the carriers and stops are represented with their IDs. To “translate” to their names, run:
dsm2dtw_df$carriers %>% head(1) %>% print(width = 120)
#> # A tibble: 1 x 4
#> Id Code Name
#> <int> <chr> <chr>
#> 1 -676 "" Mesa Airlines DBA United Express
#> ImageURL
#> <chr>
#> 1 https://s1.apideeplink.com/images/airlines/default.png
dsm2dtw_df$places %>% head(1) %>% print(width = 120)
#> # A tibble: 1 x 5
#> Id ParentId Code Type Name
#> <int> <int> <chr> <chr> <chr>
#> 1 11140 2266 DSM Airport Des Moines
flightFilter()
flightFilter() allows users to filter the results obtained from flightGet(). Continued with the previous example, the user looks for flights with a budget of $1,000, no more than 1 stop, and departure time after 8AM:
flightFilter(dsm2dtw_df, max_price = 1000, max_stops = 1, out_departure = c("08:00","24:00")) %>% head(3)
#> # A tibble: 3 x 15
#> OutboundLegId InboundLegId PricingOptions OutboundLegSegm…
#> <chr> <chr> <list> <list>
#> 1 11140-190601… "" <tibble [1 × … <data.frame [2 …
#> 2 11140-190601… "" <tibble [1 × … <data.frame [2 …
#> 3 11140-190601… "" <tibble [1 × … <data.frame [2 …
#> # … with 11 more variables: OutboundLegDepartureTime <dttm>,
#> # OutboundLegArrivalTime <dttm>, OutboundLegDuration <int>,
#> # OutboundLegNo.Stops <int>, OutboundLegStops <list>,
#> # InboundLegSegments <list>, InboundLegDepartureTime <dttm>,
#> # InboundLegArrivalTime <dttm>, InboundLegDuration <int>,
#> # InboundLegNo.Stops <int>, InboundLegStops <list>Data Storage
Storing flight data as database can be efficient for automatic searching.
dbCreateDB()
dbCreateDB() is a function to connect to the local database file, default is “flight.db”. This is the pre-configuration before saving data in database.
dbCreateDB(conn = RSQLite::SQLite(), dbname = "flight.db")The flight.db includes seven tables:
#> [1] "agent" "carrier" "itinerary" "leg" "place" "price"
#> [7] "segment"
It will excute:
- connect to SQLite driver.
- create a local database file if it doesn’t exist.
- create the schema of above seven tables if they don’t exist.
dbSaveData
dbSaveDB is a function to save data into the databse file.
resp <- apiCreateSession(origin = "DSM", destination = "DTW", startDate = "2019-06-01")
resp <- apiPollSession(resp)
data <- flightGet(resp)
# Connect to SQLite database
con <- dbCreateDB(dbname = ":memory:")
dbSaveData(resp, con) # from response
dbSaveData(data, con) # from list
dbDisconnect(con)It accepts two classes of inputs: response or list. response is the request response got by apiPollSession(). list is the data got by flightGet().



