A simple means of finding mutations
An algorithm that compares genomes to find serious mutations
Gene mapping, coupling analysis, sequence comparisons – these three terms stand for the long and difficult search for the genetic mutation behind an interesting phenotype. For a long time, scientific attempts to identify relevant mutations could only be described as piecemeal. The search for causal mutations was made simpler by the sequencing of entire genomes. Their reconstruction, however, requires the entire sequence of a representative individual, i.e. the reference sequence. As there is no matching reference sequence for every plant, the search for relevant mutations remains very difficult to this day.
Korbinian Schneeberger, George Coupland and their colleagues have now developed a method that does not need reference sequences. Based on the simple theory that the DNA of the parental plant differs to the DNA of the mutants in the relevant mutation, the method therefore seeks to draw a direct comparison of these closely related genomes. If the identical sequences are removed by an algorithm, this means that only those that differentiate the two genomes are left. These are analysed using so-called “k-mers”. K-mers describe fragments that are roughly thirty base pairs in length and can thereby be counted and grouped very easily and efficiently. All identical k-mers, i.e. all identical DNA sequences, are grouped together in a stack. As fragments with the relevant mutation have a different sequence to the parental sequence, a new k-mer stack is opened for their specific sequence information. In the end, the new algorithm shows which new stacks have arisen from the comparison and the genes that they belong to.
How do Schneeberger and his colleagues now ensure that they do not end up spending their time on irrelevant mutations or sequence errors in the genome comparison? “To exclude these sources of interference, there are various strategies that can be used. These are even applied to a certain extent during the actual comparison,” says Schneeberger. “We have to exclude non-causal mutations at an early stage.” When sequencing the genomes, the genetic information is read a number of times. Sequence errors only appear from time to time and not always in the same place. They are therefore uncommon. Such rare sequence mutations can be eliminated from the k-mer stacks. The exclusion of irrelevant mutations is more difficult. For this task, the selection of the parental material is important. Either two mutants are compared with one another (whereby the same gene has unmistakably mutated) or the parental plant is compared with mutant pools. Mutant pools result from a cross between the parental plant and mutants and represent the F2 generation. Each plant in these pools has exactly the same mutation for the new phenotype. The causal mutation is therefore in the majority compared to irrelevant mutations. This means that the irrelevant mutations are rare and can be eliminated from the k-mer stacks. “We gave the new method the name NIKS,” says Karl Nordström, the developer behind the algorithm. “NIKS for ‘needle in the k-stack’.”
If you compare genomes of parental plants with genomes from cross pools, you will find the relevant mutation in a k-mer stack. The stack will be missing from the parental plants, but present in the cross pool. If you compare two plants with different mutations in one and the same gene, you will see which new k-mer stack belongs to the same gene in both plants. “Our method is so robust that astonishingly few false positive results are produced,” says Schneeberger, commenting on the potential of NIKS. “The percentage of correctly identified mutations is well over 98 percent. And all this without the aid of a reference sequence.”
The bioinformatics specialist and his team have tested the new method in different ways. First, well-known mutations in rice were confirmed. Schneeberger and Coupland then looked for unknown mutations in the alpine rockcress Arabis alpina. A special feature of this plant is that it normally only flowers once it has been exposed to the cold of winter. Maria Albani and George Coupland isolated a mutant that no longer depends on the cold stimulus. “Using NIKS, we found the causal mutation among more than 350 million bases. This shows that we can find new and relevant mutations without having to resort to using a reference sequence,” says Schneeberger. “The greatest value of NIKS will be in the ability to get to the relevant mutation faster in an unknown genome.” The scientists in Cologne even see a new field of work here, as many interesting phenotypes – e.g. the resistance to pests – only appear in species that are rarely studied and for which there are no reference sequences.