Main Functions
Download data with API
apiCreateSession()
apiCreateSession()
allows the users to input their flight information (origin, destination and dates) and create a session on the API server. The output contains a session ID. For example, to buy a ticket from Des Moines to Detroit for an adult on 2019-06-01 (the departure date cannot be earlier than the current date):
dsm2dtw_session <-
apiCreateSession(origin = "DSM", destination = "DTW", startDate = "2019-06-01", adults = 1)
The output of apiCreateSession()
is used as the input of apiPollSession()
.
apiPollSession()
apiPollSession()
retrieves the flight data searched with apiCreateSession()
and allows the users to sort and filter the tickets by various standards. The default values of all filter variables are NULL
, meaning that we do not filter anything before we obtain the actual data. For example, to search the previous result in price ascending order:
dsm2dtw_res <- apiPollSession(response = dsm2dtw_session, sortType = "price", sortOrder = "asc")
Let’s check the content of the output of apiPollSession()
:
dsm2dtw_res %>% content %>% names
#> [1] "SessionKey" "Query" "Status" "Itineraries" "Legs"
#> [6] "Segments" "Carriers" "Agents" "Places" "Currencies"
The output of apiPollSession()
is messy, because it contains several sub-lists, such as “itineraries”, “legs”, and “segments”. The relationship between these terms are shown below.
\[ \text{searching result} \begin{cases} \text{itinerary_1} \begin{cases} \text{leg_1} \begin{cases} \text{segment_1} \\ \text{segment_2} \\ \vdots \\ \text{segment_S} \end{cases} \\ \text{leg_2} \begin{cases} \text{segment_1} \end{cases} \end{cases} \\ \text{itinerary_2} \begin{cases} \text{leg_1} \begin{cases} \text{segment_1} \\ \text{segment_2} \end{cases} \\ \text{leg_2} \begin{cases} \text{segment_1} \end{cases} \end{cases} \\ \vdots \\ \text{itinerary_n} \begin{cases} \text{leg_1} \begin{cases} \text{segment_1} \end{cases} \\ \text{leg_2} \begin{cases} \text{segment_1} \end{cases} \end{cases} \end{cases} \]
One searching request may contain several itineraries. A one-way trip contains one leg, whereas a round-way trip contains two: outbound leg and inbound leg. One leg contains several segments if it is not a direct flight.
Data Processing
flightGet()
flightGet()
allows users to input the result from PollSession()
or to read from database (explain later in “Data Storage” section). The output contains a list of seven dataframes, whose names are printed below:
dsm2dtw_df <- dsm2dtw_res %>% flightGet()
#> Warning: Unmatch of Segments and Stops: [LegId] =
#> 11140-1906010743--30963,-32462-3-11152-1906020959
#> Warning: Unmatch of Segments and Stops: [LegId] =
#> 11140-1906011413--30963-1-11152-1906020959
#> Warning: Unmatch of Segments and Stops: [LegId] =
#> 11140-1906010713--30963-1-11152-1906011729
names(dsm2dtw_df)
#> [1] "price" "itineraries" "legs" "segments" "carriers"
#> [6] "agents" "places"
The dataframe “price” provides information, such as the seraching time and pricing options:
dsm2dtw_df$price %>% head(3) %>% print(width = 120)
#> # A tibble: 3 x 4
#> SearchTime OutboundLegId
#> <dttm> <chr>
#> 1 2019-05-09 19:55:01 11140-1906011835--31722-1-11152-1906012344
#> 2 2019-05-09 19:55:01 11140-1906010615--31722-1-11152-1906011349
#> 3 2019-05-09 19:55:01 11140-1906011520--31722-1-11152-1906012026
#> InboundLegId PricingOptions
#> <chr> <list>
#> 1 "" <tibble [1 × 3]>
#> 2 "" <tibble [1 × 3]>
#> 3 "" <tibble [1 × 3]>
Within the same itinerary, there might be several different prices due to different agents:
dsm2dtw_df$price$PricingOptions[[39]] %>% print(width = 120)
#> # A tibble: 2 x 3
#> AgentId Price
#> <int> <dbl>
#> 1 1960211 4037.
#> 2 3987731 4052.
#> LinkURL
#> <chr>
#> 1 http://partners.api.skyscanner.net/apiservices/deeplink/v2?_cje=NqagL4PV…
#> 2 http://partners.api.skyscanner.net/apiservices/deeplink/v2?_cje=NqagL4PV…
The dataframe “leg” provides information, such as duration and number of stops:
dsm2dtw_df$legs %>% head(3) %>% print(width = 120)
#> # A tibble: 3 x 9
#> Id SegmentIds OriginId DestinationId DepartureTime
#> <chr> <list> <int> <int> <dttm>
#> 1 11140-1906010605--… <chr [3]> 11140 11152 2019-06-01 06:05:00
#> 2 11140-1906011658--… <chr [3]> 11140 11152 2019-06-01 16:58:00
#> 3 11140-1906011705--… <chr [2]> 11140 11152 2019-06-01 17:05:00
#> ArrivalTime Duration No.Stops Stops
#> <dttm> <int> <int> <list>
#> 1 2019-06-01 18:14:00 669 2 <data.frame [2 × 2]>
#> 2 2019-06-02 14:20:00 1222 2 <data.frame [2 × 2]>
#> 3 2019-06-02 22:25:00 1700 1 <data.frame [1 × 2]>
We can also check the stop information and the layover in minutes with the “leg” dataframe for each leg:
dsm2dtw_df$legs$Stops %>% head(3) %>% print(width = 120)
#> [[1]]
#> StopId Layover
#> 1 12389 81
#> 2 15062 191
#>
#> [[2]]
#> StopId Layover
#> 1 15062 70
#> 2 16177 525
#>
#> [[3]]
#> StopId Layover
#> 1 10959 1416
Similarly, the detailed results about the segments are stored in the “segments” dataframe:
dsm2dtw_df$segments %>% head(2) %>% print(width = 120)
#> # A tibble: 2 x 9
#> Id OriginId DestinationId DepartureTime ArrivalTime
#> <chr> <int> <int> <dttm> <dttm>
#> 1 11140-190… 11140 12389 2019-06-01 06:05:00 2019-06-01 08:41:00
#> 2 12389-190… 12389 15062 2019-06-01 10:02:00 2019-06-01 12:39:00
#> Duration CarrierId OperatingCarrierId FlightNumber
#> <int> <int> <int> <chr>
#> 1 156 1793 -676 6115
#> 2 157 1793 1793 1403
In the above outputs, the carriers and stops are represented with their IDs. To “translate” to their names, run:
dsm2dtw_df$carriers %>% head(1) %>% print(width = 120)
#> # A tibble: 1 x 4
#> Id Code Name
#> <int> <chr> <chr>
#> 1 -676 "" Mesa Airlines DBA United Express
#> ImageURL
#> <chr>
#> 1 https://s1.apideeplink.com/images/airlines/default.png
dsm2dtw_df$places %>% head(1) %>% print(width = 120)
#> # A tibble: 1 x 5
#> Id ParentId Code Type Name
#> <int> <int> <chr> <chr> <chr>
#> 1 11140 2266 DSM Airport Des Moines
flightFilter()
flightFilter()
allows users to filter the results obtained from flightGet()
. Continued with the previous example, the user looks for flights with a budget of $1,000, no more than 1 stop, and departure time after 8AM:
flightFilter(dsm2dtw_df, max_price = 1000, max_stops = 1, out_departure = c("08:00","24:00")) %>% head(3)
#> # A tibble: 3 x 15
#> OutboundLegId InboundLegId PricingOptions OutboundLegSegm…
#> <chr> <chr> <list> <list>
#> 1 11140-190601… "" <tibble [1 × … <data.frame [2 …
#> 2 11140-190601… "" <tibble [1 × … <data.frame [2 …
#> 3 11140-190601… "" <tibble [1 × … <data.frame [2 …
#> # … with 11 more variables: OutboundLegDepartureTime <dttm>,
#> # OutboundLegArrivalTime <dttm>, OutboundLegDuration <int>,
#> # OutboundLegNo.Stops <int>, OutboundLegStops <list>,
#> # InboundLegSegments <list>, InboundLegDepartureTime <dttm>,
#> # InboundLegArrivalTime <dttm>, InboundLegDuration <int>,
#> # InboundLegNo.Stops <int>, InboundLegStops <list>
Data Storage
Storing flight data as database can be efficient for automatic searching.
dbCreateDB()
dbCreateDB()
is a function to connect to the local database file, default is “flight.db”. This is the pre-configuration before saving data in database.
dbCreateDB(conn = RSQLite::SQLite(), dbname = "flight.db")
The flight.db includes seven tables:
#> [1] "agent" "carrier" "itinerary" "leg" "place" "price"
#> [7] "segment"
It will excute:
- connect to SQLite driver.
- create a local database file if it doesn’t exist.
- create the schema of above seven tables if they don’t exist.
dbSaveData
dbSaveDB
is a function to save data into the databse file.
resp <- apiCreateSession(origin = "DSM", destination = "DTW", startDate = "2019-06-01")
resp <- apiPollSession(resp)
data <- flightGet(resp)
# Connect to SQLite database
con <- dbCreateDB(dbname = ":memory:")
dbSaveData(resp, con) # from response
dbSaveData(data, con) # from list
dbDisconnect(con)
It accepts two classes of inputs: response
or list
. response
is the request response got by apiPollSession()
. list
is the data got by flightGet()
.