This is best described with an example. Given the paragraph:
The longest string in this paragraph is not the shortest string in the paragraph because it is the longest string in the paragraph
I want to list the order of matching sub-strings first by frequency and then by length, so in this case, it should list (non case-sensitive)
The longest string inthe paragraphis not the shortest string inbecauseit isthis
The above lists the substrings by the order of frequency they occur, followed by length, so The longest string in is repeated twice and is the longest substring. is not the shortest string in is longer than the paragraph, but the paragraph is repeated twice, so it is listed first.
Update(based on observation by AlexC and MattBurland):
Even if a sub-string such as the space character or in occur more than other substrings, they should not be listed if they are already included in a substring that is longer than their occurrence * length. For example, in occurs 3 times which is 6 characters in length (9 including spaces at the end), but since 9 characters is shorter than the paragraph, it is not listed. I hope this makes sense?