by François Perruchas [1] - INGENIO [CSIC-UPV]

Some time ago, I was asked to create an Access macro to recover all item pairs which share a common element. For example, imagine you want to link cited articles through citing articles, and you want to know which articles are the most cited together (a pair).

We set one table with 2 columns, the first one is the citing article ID, the second one is the cited article ID, and then the macro calculates all the possible pairs, with one loop inside another. With an example it's easier to understand:

*Input Table*

ID Cited Article (IDCited) | ID Citing Article (IDCiting) |
---|---|

1 | A |

1 | B |

1 | C |

1 | D |

2 | E |

2 | A |

2 | B |

2 | D |

The first loop is used to get the first element of the pair, the second loop is to get the second element. As we don't want repetitions (AB is the same than BA), we have to loop from the next element until the last (with elements ordered).

In our example, the result will be:

For IDCited = 1, we will have AB, AC, AD, BC, BD, CD.

For IDCited = 2, we will have AB, AD, AE, BD, BE, DE.

But then somebody came and told me: *“I'm using your macro to have pairs of keywords to search for similar articles, but the search is too 'wide'. Is it possible to do the same but instead of pairs, to have group of 3, 4 or more elements?” *Indeed, it's a combination, as we studied in high school, written C(n,k) – and the pair we calculated before is just C(n,2). It's easy to calculate the number of possible combinations (see wikipedia [2]), but more difficult to produce the list.

The first solution that came to me was to do the same than with pairs, for 3 elements it will be 3 loops, 4 elements 4 loops, and so on... The problem here is that the number of elements is a parameter, so if we want to follow this logic, we need to write specific instructions for each value of k.

The second solution, as we need a loop in any case, is to use a “while” with a boolean variable to control the loop end. Then, after some thought, I realized that, following the same logic as with the pairs, getting all the possibles combinations is like counting, without repeating numbers and combinations.

Let's have another example, imagine your group is {1,2,3,4,5,6}, then all the possible combinations of C(6,2) will be:

12, 13, 14, 15, 16, 23, 24, 25, 26, 34, 35, 36, 45, 46, 56

As we can see from this example, the right numeral is always strictly superior to the value of the left numeral, and always less or equal to the number of elements (n). The left numeral is also always equal or inferior to the number of elements minus 1. When the numeral value is superior to the number of elements, we add 1 to the numeral to its left and reinitialize it.

All of this means that for every loop, we just need to add 1 to the far right numeral, and then check if each numeral follows these rules. We will have all the possible combinations when the far left numeral value is over the number of elements minus one.

The only problem here is that the cited articles are not integers but strings. The easiest way to sort this out is to use arrays: the cited articles will be the values and the numbers will be the keys, and then we will have another array that holds every “numeral value”, one per element.

Below is the part of the algorithm inside the loop that goes through every ID Citing

Dim Arr_Group() As String 'the array that receives the actual combination elements

Dim Arr_Elements() As String ' the array that holds all the elements ordered

Dim Arr_IdGroup() As Integer 'the array that holds the numeral values

Dim StopLoop As Boolean

Dim GroupSize As Integer 'GroupSize is set at the beginning

'First we initialize Arr_IdGroup that holds the numeral values

ReDim Arr_IdGroup(GroupSize - 1)

Arr_IdGroup(0) = 1

For i = 1 To GroupSize - 1

Arr_IdGroup(i) = Arr_IdGroup(i - 1) + 1

Next i

StopLoop = False

Do Until StopLoop

'We erase the array that holds the values of the group

Erase Arr_Group()

ReDim Arr_Group(GroupSize - 1)

'We get the actual combination

For i = 0 To GroupSize - 1

Arr_Group(i) = Arr_Elements(Arr_IdGroup(i) - 1)

Next i

'Here we apply the rules defined before - from right to left

For i = 0 To GroupSize - 1

'We add 1 to the far right numeral

If i = 0 Then Arr_IdGroup(GroupSize - i - 1) = Arr_IdGroup(GroupSize - i - 1) + 1

'Then we check the 3 rules for the actual numeral

'If it's above the number of elements, then we add + 1 to the left numeral and we reinitialize right numeral

If Arr_IdGroup(GroupSize - i - 1) > (UBound(Arr_Elements) - i) Then

'We check if it's not the far left numeral

If GroupSize - i - 2 > -1 Then

Arr_IdGroup(GroupSize - i - 2) = Arr_IdGroup(GroupSize - i - 2) + 1

Arr_IdGroup(GroupSize - i - 1) = Arr_IdGroup(GroupSize - i - 2) + 1

Else

'Because if so it means that we already get all the possible combinations, so we stop the loop

StopLoop = True

End If

'Here we reinitialize the actual numeral if it's superior to the limit

If Arr_IdGroup(GroupSize - i - 1) > (UBound(Arr_Elements) - i) Then

Arr_IdGroup(GroupSize - i - 1) = UBound(Arr_Elements) - i

End If

End If

Next i

'Here you just have to save the array Arr_Group

Loop

Then of course you need to implement the other part of the code.