0

I have a reasonable-sized set of size N (say 10 000 objects) in which I am searching for groups of compatible elements. Meaning that I have a function y = f(x_1, x_2, x_3, ..., x_n) returning bool 0/1 answer whether n elements are compatible. We are interested in executing this search on each subset smaller than 8 elements of N. Which is obviously either NP hard or close to this. Even for pairwise search for n elements set we have n^2 pairs to explore, which is somehow doable yet still grows quite fast for big sets. Yet then it goes nasty when we are looking for triples (identifying 3-element cliques) and quadriple or greater magnitudes.

Would you suggest any machine learning approach to help with this task? Maybe at efficiently sampling the search-space and identifying which subsets to search on, maybe fast heauristic evaluations of the function to direct the searches.

  • 1
    I don't think any ML can help without additional simplifications and/or data-specific relations. For example, can we assume that there is some binary relation between pairs of elements which are compatible, and that this relation is transitive? That would at least allow you to make the process incremental. Also an important question is whether it's a requirement to find all such compatible groups: if yes that would require an exhaustive search anyway (assuming there are no additional constraints that can be used). – Erwan Jul 29 '19 at 13:56
  • thanks, well I manage to train the classifier to quite well identify on the generic attributes of elements to see if they are compatible, it allowed to reduce the search by order of magnitude, but still requires to search all pairs. If I want to go through all the triples.... well that is something we not want. – Intelligent-Infrastructure Jul 29 '19 at 14:02
  • I can only give you vague ideas: one would be to see if it's possible to apply clustering on the elements in order to reduce the number of candidate sets. Another idea would be genetic algorithms, but I think that would make sense only if the goal is to find *some* compatible groups, not all of them. – Erwan Jul 29 '19 at 14:23

1 Answers1

0

This is not ML but your setup reminds me of the Covering Array Problem, which is used for Software Testing.

If you consider your configuration arrays [0 0 1 0 1 0 1 0 1 .... ] as certain tests run , a typical question is what is the minimum number of tests I need to run while I can cover most different setups? Important to increase test completeness but also minimize time it takes to run them.

Possible some covering array solutions might help. There are even tables that are pre-computed that you can use for different number of N and coverage strengths https://math.nist.gov/coveringarrays/coveringarray.html

And as you pointed out this is a combinatorial problem that suffers from combinatorial explosion. You might want to divide/conquer (split the N into k manageable groups).

skadio
  • 82
  • 2