Skip to main content
  • Research Article
  • Published:

Segmentation of DNA into Coding and Noncoding Regions Based on Recursive Entropic Segmentation and Stop-Codon Statistics

Abstract

Heterogeneous DNA sequences can be partitioned into homogeneous domains that are comprised of the four nucleotides A, C, G, and T and the stop-codons. Recursively, we apply a new entropic segmentation method on DNA sequences using Jensen-Shannon and Jensen-Rényi divergences in order to find the borders between coding and noncoding DNA regions. We have chosen 12- and 18-symbol alphabets that capture (i) the differential nucleotide composition in codons, and (ii) the differential stop-codon composition along all the three phases in both strands of the DNA. The new segmentation method is based on the Jensen-Rényi divergence measure, nucleotide statistics, and stop-codon statistics in both DNA strands. The recursive segmentation process requires no prior training on known datasets. Consequently, for three entire genomes of bacteria, we find that the use of nucleotide composition, stop-codon composition, and Jensen-Rényi divergence improve the accuracy of finding the borders between coding and noncoding regions in DNA sequences.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Nicorici.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nicorici, D., Astola, J. Segmentation of DNA into Coding and Noncoding Regions Based on Recursive Entropic Segmentation and Stop-Codon Statistics. EURASIP J. Adv. Signal Process. 2004, 832471 (2004). https://doi.org/10.1155/S1110865704309212

Download citation

  • Received:

  • Revised:

  • Published:

  • DOI: https://doi.org/10.1155/S1110865704309212

Keywords