Finding Hidden City Flights using Neo4j & Python 🐍 Part #3

Sun, May 13, 2018 5-minute read

In this post, I’ll introduce you to hidden city ticketing using Python.

Hidden city ticketing lets you save a lot of money, especially when otherwise no other cheap fares are available. Of course we’ll do it the python and neo4j way!

This post is part of a series of posts. I highly recommend to also check out the previous post, in case you haven’t been following along from the start.

What is hidden city ticketing?

Let’s assume you wanna fly from Greensboro, NC (GSO) to New York Newark, NJ (EWR). It would cost you 179 € for a one-way flight. Not bad, but we can get there cheaper…

United Airlines direct flight from Greensboro to New York

If you search for a flight that connects in Newark, but continues somewhere else the price for the same flight drops to 133 €. Instead of booking a flight to Newark you would book one where the final destination is Providence and get off the plane in Newark.

United Airlines onestop flight from Greensboro to New York to Providence

Disclaimer: Hidden City is a great way to save money, but it also violates the terms of service of some airlines. You, as a passenger and customer are solely responsible for your actions. It’s not the authors intent to support you in doing anything fraudulent. Instead this post should be a proof of concept how neo4j and python can be combined to find low airline fares.

Note: Hidden City Ticketing only works for one-way flights, as the other segments become invalid if you didn’t fly them.

How to store the flight data in Neo4j?

Great if you made it past the disclaimer, let’s get started with the fun part. Before we can proceed, I’d like to talk about how I store the flights in Neo4j database.

def createNonStopInboundRate(origin, destination, flight_number,dtime, atime, price, distance, final_destination):
    query = ''' 
    MERGE (orig:Airport {code: {orig}, month: {month}, day:{day}})
    MERGE (dest:Airport {code: {dest}, month: {month}, day:{day}})
    MERGE (f:Flight{flight_number:{flight_number}, month: {month}, day:{day}})
    MERGE (m:Month{month:{month}})-[:HAS_DAY]->(d:Day{day:{day}})
    WITH orig, dest, f, m, d
    CREATE UNIQUE (m)-[:HAS_DAY]->(d)-[:HAS_AIRPORT{month: {month}, day:{day}}]->(orig)-[:HAS_FLIGHT{month:{month}, day:{day}, departure_time:{departure_time}}]->(f)-[:TO{month:{month}, day:{day}, departure_time: {departure_time}, arrival_time:{arrival_time}, price: {price}, duration: {duration}, distance: {distance}, final_destination: {final_destination}}]->(dest)
    '''
    parameters = {
			'month': int(float(datetime.datetime.fromtimestamp(dtime).strftime('%m'))),
			'day': int(float(datetime.datetime.fromtimestamp(dtime).strftime('%d'))),
			'orig': origin, 
			'dest': destination, 
			'flight_number': flight_number, 
			'price': price, 
			'duration': (atime-dtime),
			'distance': distance,
			'departure_time': dtime,
			'arrival_time': atime,
      'final_destination':final_destination
    }
    try:
        g = Graph(user='dbuser', password='dbpassword')
        g.run(query, parameters)
    except Exception:
        print 'Redudant pattern ignored.'
    print ("{} -> {} at {} EUR, flight code {} duration={}, departs at {} and arrivs at : {} on 2018/{}/{}".format(parameters['orig'], parameters['dest'], parameters['price'], parameters['flight_number'], parameters['duration'], parameters['departure_time'], parameters['arrival_time'], parameters['month'], parameters['day']))

I am calling this function for every segment of a flight that I search. This function creates the relationship between the date, the origin and destination airport and the flight which is also saved as a node. You may have also noticed that I am also passing a bunch of properties like distance and price.

Let us look at a simple snippet of code which I use for looping over the segments of the flights and insert them into the database.

def executeMultiStopQuery(origin, destination):
    s = kiwi.Search()
    res = s.search_flights(flyFrom=origin, to=destination, dateFrom=arrow.utcnow().shift(days=+1).format('DD/MM/YYYY'), dateTo=arrow.utcnow().shift(days=+20).format('DD/MM/YYYY'), partner='picky', directFlights=0, asc=1)
    res = res.json()
    for r in res['data']:
        print '#'*20
        flight_number = ''
        for i, route in enumerate(r['route']):
            if i != len(r['route']) and i != 0:
                flight_number += "-"
            flight_number += route['airline']+str(route['flight_no'])
            print("[{}] {}{}: {} -> {}, price: {} EUR, distance: {}").format(i,route['airline'],route['flight_no'], route['flyFrom'], route['flyTo'], r['price'], r['distance'])
            # Distance represents total distance!
            createNonStopInboundRate(origin, route['flyTo'], flight_number, r['dTimeUTC'], route['aTimeUTC'], r['price'], r['distance'], destination)

The MERGE-Clause is really useful in this case, as it only creates the nodes if they are not already stored in the database. That way I can avoid having hundreds of nodes for December.

The image below shows what the origin-, destination airport, day, month and flight node look like once they’ve been inserted into Neo4j. It shows a number of possible connections from GSO to EWR on 06/06/2018. Prior to creating this image I’ve searched and inserted GSO->EWR and GSO->PVD.

HiddenCity_Neo4j

The Cypher Query for finding these nonstop connections is fairly simple:

MATCH con= (:Month{month:6})-[:HAS_DAY]->(:Day{day:6})-[:HAS_AIRPORT]->(:Airport{code:'GSO'})-[r*..6]->(:Airport{code:'EWR'}) return con

You may be wondering what these connections with multiple flight numbers are. These are additional flights (“hidden flights”) I am inserting whenever a search consists out of multiple flights. I basically join the flight numbers and also save the final destination as property. That way I have much more combinations in my database even if I haven’t proactively searched them and I’ll be able to retrieve them using my standard query. Note, we can’t just insert a single relationship between the airports of the segments, as the flights have to be taken in the specific order as ticketed.

How to find them the pythonic way?

You may now be saying how did I know that PVD is a possible destination for our journey which drops the price of our flight and still connects in EWR. At first, I was searching the direct connections departing from EWR from openflights.org’ route list and then inserting those into the database.

def getConnectionAirports(hiddencity):
    list_airports = []
    with open('routes.csv', 'rb') as csvfile:
        airportreader = csv.DictReader(csvfile)
        for row in airportreader:
            if row['Source Airport'] == hiddencity:
                list_airports.append(row['Destination Airport'])
    return list_airports

And in the next step, I am basically searching this list of airports as my destination. We could swap the search_flights function for the search_flights_multi, but I deliberately kept it as I get more connections (even if they might not even connect in EWR) which later on might come in handy.

for destination in getConnectionAirports('EWR'):
    executeMultiStopQuery('GSO', destination)

The last step would be to query the database for the cheapest connection, whether it’s hidden city ticket, a connection or just a straight direct flight. We can search the cheapest flight with this cypher query.

MATCH con= (:Month{month:6})-[:HAS_DAY]->(:Day{day:6})-[:HAS_AIRPORT]->(:Airport{code:'GSO'})-[:HAS_FLIGHT]->(:Flight{})-[f:TO]->(:Airport{code:'EWR'}) return con order by f.price limit 1

tl;dr

This fast and easy run down demonstrates that hidden city flights can efficiently be searched using Python and Neo4j. By using the information from openflights.org, we’ve been able to find some great airfares. There is still a lot of room for improvement, but I’ll cover that in another post.

Cheers Alex

PS: Thanks also Philippe Khin for his blog posts and code snippets about a different project, which had a lot of overlap with my own project. These came in handy!