WiGLE is a popular platform which can be used for finding the location of a device using the names of WiFi networks in its vicinity. I’ve written about this before, and wrote some Python code to interact with their API. This API has since been retired and replaced with a new one, as of December […]

We all want to find the best results for the lowest cost, but sometimes it’s not as simple as that.

Imagine you have a large number of choices from which you are trying to find the best option. For example, let’s say you have a range of energy efficiency refurbishment packages available and you want to minimise the installation cost while maximising the annual energy saved.

A two dimensional Pareto frontier is defined as the set of options for which you can’t improve on one metric without making the other metric worse. There are analogues in more than two dimensions but this snippet only works for the 2D case.

On our energy saving measures Pareto frontier, one end will be the measure which can save the largest amount of energy per year. The other end will be the cheapest measure.

To find the points in between that lie on the frontier there is a fairly simple algorithm to follow. First you need to sort your measures by one of the metrics. Let’s sort by cost, from cheapest to most expensive.

So with our energy saving measures sorted by cost, we start with the cheapest measure and work our way upwards in cost. If the next more expensive measure also saves more energy than the cheapest measure it belongs on the Pareto frontier. If it not only saves less and but is also more expensive then we can clearly throw it out (technically such options are called “dominated”).

Now continue in the same way, only comparing the next measure to the one you’ve just added to the frontier. Again if it saves more energy, add it to the frontier. Keep going in the same way until you reach the most expensive measure.

You may not always want to maximise one metric while minimising the other. Perhaps you want to find the greatest carbon saving for the highest financial return on investment. You could rejig your equations to give you the right kind of numbers, but this snippet gives you the option of specifying for which variables you want the largest values and for which you want the smallest values.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
''' Method to take two equally-sized lists and return just the elements which lie on the Pareto frontier, sorted into order. Default behaviour is to find the maximum for both X and Y, but the option is available to specify maxX = False or maxY = False to find the minimum for either or both of the parameters. ''' def pareto_frontier(Xs, Ys, maxX = True, maxY = True): # Sort the list in either ascending or descending order of X myList = sorted([[Xs[i], Ys[i]] for i in range(len(Xs))], reverse=maxX) # Start the Pareto frontier with the first value in the sorted list p_front = [myList[0]] # Loop through the sorted list for pair in myList[1:]: if maxY: if pair[1] >= p_front[-1][1]: # Look for higher values of Y… p_front.append(pair) # … and add them to the Pareto frontier else: if pair[1] <= p_front[-1][1]: # Look for lower values of Y… p_front.append(pair) # … and add them to the Pareto frontier # Turn resulting pairs back into a list of Xs and Ys p_frontX = [pair[0] for pair in p_front] p_frontY = [pair[1] for pair in p_front] return p_frontX, p_frontY |

To plot that Pareto frontier you can use the matplotlib library.

1 2 3 4 5 6 7 8 9 |
import matplotlib.pyplot as plt Xs, Ys = # get your data from somewhere to go here # Find lowest values for cost and highest for savings p_front = pareto_frontier(Xs, Ys, maxX = False, maxY = True) # Plot a scatter graph of all results plt.scatter(Xs, Ys) # Then plot the Pareto frontier on top plt.plot(p_front[0], p_front[1]) plt.show() |

One small observation worth making here is that a Pareto frontier like the one in our energy savings example gives us a way of placing a value on one metric in terms of the other. Just because all your non-dominated options lie on the Pareto frontier, doesn’t mean that there’s no way to choose between them. It just means that you can’t choose between them based solely on those two metrics.

By making a decision to install a particular measure you are implicitly putting a value on saving a given amount of energy per year.

It works with anything. You can even compare apples and orange. Put apples on the x-axis and oranges on the y-axis, choose an option that gives you some apples and some oranges and that tells you how many apples you think are worth how many oranges.

## 17 Comments on “Find Pareto frontiers in Python”

Hi,

do you have code to do this in 3 or more dimensions?

best,

Beau

I’ll take a look. I think the algorithm should extend pretty simply.

I went with a slightly different implementation as follows:

You’ll need to pass it a Numpy array but this works really well in testing.

Hello Jamie,

I’d love to get this bit of code working but it doesn’t seem to go as-is. I set line 10 to read “if sum(row[x]) >= pareto_frontier[-1][x]:” but I don’t see where x is defined. Can you clarify?

It’s defined in the line that follows –

`for x in range(len(row))]) == len(row):`

.Woops. Sorry to waste your time. I didn’t think to conceptualize lines 10 and 11 as a singular line while they show as two. I’m not a python expert and didn’t know that such things were allowed. Thanks!

plz i dont undrestand something you say that if “we start with the cheapest measure and work our way upwards in cost. If the next more expensive measure also saves more energy than the cheapest measure it belongs on the Pareto frontier ” that means we put initially a in the set then we compare b if b>= a then we put also in the set of frontier but b>=a means that a will be called dominated plzz i have a conflict

The point is that it is only dominated if there is an option which performs better on both metrics. In this case if there is another option that is both cheaper and saves more energy. If a is the cheapest then it is by definition the best-performing in terms of cost and so can’t be dominated by another option that saves more energy.

I have tried to use your multi-dimensional snippet at: http://code.activestate.com/recipes/578287-multidimensional-pareto-front/. Unfortunately, the result is not as expected.

The test() method (missing a colon after the function definition) shows that [1,1,1], [2,2,2], [3,3,3] and [4,4,4] are all Pareto (I interpret the result of the method as the Pareto points that were in the input set). These points can only be all Pareto if the numbers are partially ordered (and 1,2,3 and 4 cannot be ordered by it); however, in case of minimization or maximization I would expect respectively [1,1,1] and [4,4,4] to be the single Pareto points in the input set.

I think you have almost implemented Simple Cull, but it’s just not finished yet. See https://etda.libraries.psu.edu/paper/6336/1601 for the PhD thesis describing Simple Cull.

Here’s my modified version of your implementation: it uses Simple Cull, but is probably less pythonic in nature 😉

You could simply change it for minimization, by sorting by ascending value and replacing the domination check =.

See http://pythonfiddle.com/pareto-simple-cull for the code.

Hi

I want to write a problem with 3 objectives( 2 mins and a max objectives) with 3 variables in matlab program but i cant define it. If it is possible for you please help me that who i can define i t or send a problem like that.

best

hamed

Sorry, I missed this comment before. Also, I’ve never used Matlab so don’t really think I can help.

Hi,

Thank you very much for your code. It helped me a lot.

I think I found a small error in it. If there are solutions with the same value of X and better value of Y, this algorithm save in the pareto front list also the solutions that should not be on the pareto front. Am I right?

I have changed the code in this way. the problem seems to be solved, but I am a beginner in python. Do you think my way of thinking is correct?

If I run pareto_frontier(test[:,0], test[:,1], maxX = False, maxY = False) , (note maxX and maxY are both set to False, that means, I want to minimize both objectives), on the array: test = np.array( [[1/5,1],[1/4,1],[1/3,0.9],[0.5,1],[3/5,2/4],[1,1/3]] ), then I get as a result:

([0.20000000000000001, 0.25, 0.33333333333333331, 0.59999999999999998, 1.0],

[1.0, 1.0, 0.90000000000000002, 0.5, 0.33333333333333331])

That means according to the algorithm, the point [1/4,1] is also non dominated. But it in a minimization problem where both objectives have to be minimized, the point [1/5,1] dominates [1/4,1] , so [1/4,1] should not be in the output of the algorithm, should it?

So it is assumed here, that all points have different x coordintes and different y coordinates, right?

Hi Max,

I’ve not looked at this for several years and don’t have the time right now, but I’d suggest using the function in joost’s comment http://oco-carbon.com/metrics/find-pareto-frontiers-in-python/#comment-11501