Problem 3

Challenge Title |Diversity in Cluster Analysis.

Brief description of the challenge:

In Cluster Analysis, given a data set X and a distance matrix on X, one seeks a partition of X into p subsets in such a way that elements in the same group are close from each other (and elements in different subsets are far from each other). At the same time, a representative of each group is chosen.

Suppose a problem in which fairness is considered: we have one or  several categorical features (in the simplest case, one binary variable), identifying sensitive issues (e.g. race, gender, nationality, etc). The prototypes selected may not be diverse enough in terms of such features.

The challenge is to model the problem incorporating a diversity criterion for the representatives chosen. A mathematical optimization model is to be developed, and an algorithm is to be designed and implemented to address the challenge.

Mathematical background | Students need to have basic knowledge of some programming language, basic statistics and/or data analysis.

 

Coordinator | Emilio Carrizosa, Departamento de Estadística e Investigación Operativa, Universidad de Sevilla, España