We all want to find the best results for the lowest cost, but sometimes it’s not as simple as that.

Imagine you have a large number of choices from which you are trying to find the best option. For example, let’s say you have a range of energy efficiency refurbishment packages available and you want to minimise the installation cost while maximising the annual energy saved.


A two dimensional Pareto frontier is defined as the set of options for which you can’t improve on one metric without making the other metric worse. There are analogues in more than two dimensions but this snippet only works for the 2D case.

On our energy saving measures Pareto frontier, one end will be the measure which can save the largest amount of energy per year. The other end will be the cheapest measure.

To find the points in between that lie on the frontier there is a fairly simple algorithm to follow. First you need to sort your measures by one of the metrics. Let’s sort by cost, from cheapest to most expensive.

So with our energy saving measures sorted by cost, we start with the cheapest measure and work our way upwards in cost. If the next more expensive measure also saves more energy than the cheapest measure it belongs on the Pareto frontier. If it not only saves less and but is also more expensive then we can clearly throw it out (technically such options are called “dominated”).
Now continue in the same way, only comparing the next measure to the one you’ve just added to the frontier. Again if it saves more energy, add it to the frontier. Keep going in the same way until you reach the most expensive measure.

You may not always want to maximise one metric while minimising the other. Perhaps you want to find the greatest carbon saving for the highest financial return on investment. You could rejig your equations to give you the right kind of numbers, but this snippet gives you the option of specifying for which variables you want the largest values and for which you want the smallest values.

To plot that Pareto frontier you can use the matplotlib library.

One small observation worth making here is that a Pareto frontier like the one in our energy savings example gives us a way of placing a value on one metric in terms of the other. Just because all your non-dominated options lie on the Pareto frontier, doesn’t mean that there’s no way to choose between them. It just means that you can’t choose between them based solely on those two metrics.
By making a decision to install a particular measure you are implicitly putting a value on saving a given amount of energy per year.

It works with anything. You can even compare apples and orange. Put apples on the x-axis and oranges on the y-axis, choose an option that gives you some apples and some oranges and that tells you how many apples you think are worth how many oranges.

Jamie Bull | jamiebull1@gmail.com

Related Posts

WiGLE is a popular platform which can be used for finding the location of a device using the names of WiFi networks in its vicinity. I’ve written about this before, and wrote some Python code to interact with their API. This API has since been retired and replaced with a new one, as of December […]

Just a quick post to point out a couple of really useful tools.The first is a web-based tool for finding weather files for a location of interest. It’s similar to the Excel EPW finder tool we created a few years back, but much more modern looking. It is however missing a few of the useful […]

Eppy is a really useful library which I’ve written about several times, since before I really had anything to offer in terms of contributing code. Over the past year or so though, I’ve started to contribute back some of the changes and additions I’ve made while using eppy on academic and commercial projects.This post is […]

19 Comments on “Find Pareto frontiers in Python”

  • Beau says:

    Hi,

    do you have code to do this in 3 or more dimensions?

    best,
    Beau

    • Jamie Bull says:

      I’ll take a look. I think the algorithm should extend pretty simply.

      • Jamie Bull says:

        I went with a slightly different implementation as follows:

        You’ll need to pass it a Numpy array but this works really well in testing.

        • Greg says:

          Hello Jamie,

          I’d love to get this bit of code working but it doesn’t seem to go as-is. I set line 10 to read “if sum(row[x]) >= pareto_frontier[-1][x]:” but I don’t see where x is defined. Can you clarify?

          • Jamie Bull says:

            It’s defined in the line that follows – for x in range(len(row))]) == len(row):.

  • Greg says:

    Woops. Sorry to waste your time. I didn’t think to conceptualize lines 10 and 11 as a singular line while they show as two. I’m not a python expert and didn’t know that such things were allowed. Thanks!

  • meriem says:

    plz i dont undrestand something you say that if “we start with the cheapest measure and work our way upwards in cost. If the next more expensive measure also saves more energy than the cheapest measure it belongs on the Pareto frontier ” that means we put initially a in the set then we compare b if b>= a then we put also in the set of frontier but b>=a means that a will be called dominated plzz i have a conflict

    • Jamie Bull says:

      The point is that it is only dominated if there is an option which performs better on both metrics. In this case if there is another option that is both cheaper and saves more energy. If a is the cheapest then it is by definition the best-performing in terms of cost and so can’t be dominated by another option that saves more energy.

  • Joost says:

    I have tried to use your multi-dimensional snippet at: http://code.activestate.com/recipes/578287-multidimensional-pareto-front/. Unfortunately, the result is not as expected.

    The test() method (missing a colon after the function definition) shows that [1,1,1], [2,2,2], [3,3,3] and [4,4,4] are all Pareto (I interpret the result of the method as the Pareto points that were in the input set). These points can only be all Pareto if the numbers are partially ordered (and 1,2,3 and 4 cannot be ordered by it); however, in case of minimization or maximization I would expect respectively [1,1,1] and [4,4,4] to be the single Pareto points in the input set.

    I think you have almost implemented Simple Cull, but it’s just not finished yet. See https://etda.libraries.psu.edu/paper/6336/1601 for the PhD thesis describing Simple Cull.

  • Joost says:

    Here’s my modified version of your implementation: it uses Simple Cull, but is probably less pythonic in nature 😉

    You could simply change it for minimization, by sorting by ascending value and replacing the domination check =.

  • hamed says:

    Hi
    I want to write a problem with 3 objectives( 2 mins and a max objectives) with 3 variables in matlab program but i cant define it. If it is possible for you please help me that who i can define i t or send a problem like that.

    best
    hamed

  • Matteo says:

    Hi,
    Thank you very much for your code. It helped me a lot.
    I think I found a small error in it. If there are solutions with the same value of X and better value of Y, this algorithm save in the pareto front list also the solutions that should not be on the pareto front. Am I right?
    I have changed the code in this way. the problem seems to be solved, but I am a beginner in python. Do you think my way of thinking is correct?

  • Max says:

    If I run pareto_frontier(test[:,0], test[:,1], maxX = False, maxY = False) , (note maxX and maxY are both set to False, that means, I want to minimize both objectives), on the array: test = np.array( [[1/5,1],[1/4,1],[1/3,0.9],[0.5,1],[3/5,2/4],[1,1/3]] ), then I get as a result:

    ([0.20000000000000001, 0.25, 0.33333333333333331, 0.59999999999999998, 1.0],
    [1.0, 1.0, 0.90000000000000002, 0.5, 0.33333333333333331])

    That means according to the algorithm, the point [1/4,1] is also non dominated. But it in a minimization problem where both objectives have to be minimized, the point [1/5,1] dominates [1/4,1] , so [1/4,1] should not be in the output of the algorithm, should it?

  • Georges A. says:

    Hi , How can i use the multi dimensions algorithm with 3 objectives for example ( 2 mins and a max objectives) . The example you posted
    shows only for only maximizing all or minimizing all.

    Is there a way to do that ? Thank you .

    • Jamie Bull says:

      As we discussed over email (posting here in case anyone else sees it), the algorithm only works when maximising or minimising all objectives. To make it work, you need to invert some variables so they are all to be maximised or minimised

Leave a Reply

Your email address will not be published. Required fields are marked *