HOME   ::   Back to the Paper   ::   References

Searls, D. B. (2002) The language of genes. Nature, 420:211--217.

References (may not be complete)  [Original format]  [Sort by year]  [Sort by author]  [Sort by citations]

1. Aitchison, J. Linguistics (NTC/Contemporary Publishing, Chicago, 1999).

Google

2. Chomsky, N. Syntactic Structures (Mouton, The Hague, 1957).

Google

3. Jurafsky, D. & Martin, J. H. Speech and Language Processing (Prentice Hall, Upper Saddle River, NJ, 2000).

Google

4. Brendel, V. & Busse, H. G. Genome structure described by formal languages. Nucleic Acids Res. 12, 2561-2568 (1984).

Google

5. Head, T. Formal language theory and DNA: an analysis of the generative capacity of specific recombinant behaviors. Bull. Math. Biol. 49, 737-759 (1987).

Google

6. Searls, D. B. in Proc. 7th Natl Conf. Artif. Intell. 386-391 (AAAI Press, Menlo Park, CA, 1988).

Google

7. Searls, D. B. The linguistics of DNA. Am. Sci. 80, 579-591 (1992).

Google

8. Searls, D. B. in Logic Programming: Proc. North Am. Conf. (eds Lusk, E. & Overbeek, R.) 189-208 (MIT Press, Cambridge, MA, 1989).

Google

9. Searls, D. B. in Artificial Intelligence and Molecular Biology Ch. 2 (ed. Hunter, L.) 47-120 (AAAI Press, Menlo Park, CA, 1993).

Google

10. Searls, D. B. in Mathematical Support for Molecular Biology (eds Farach-Colton, M., Roberts, F. S., Vingron, M. & Waterman, M.) 117-140 (American Mathematical Society, Providence, RI, 1999).

Google

11. Durbin, R., Krogh, A., Mitchison, G. & Eddy, S. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (Cambridge Univ. Press, Cambridge, 1998).

Google

12. Baldi, P. & Brunak, S. Bioinformatics: The Machine Learning Approach (MIT Press, Cambridge, MA, 2001).

Google

13. Lyngso, R. B. & Pedersen, C. N. RNA pseudoknot prediction in energy-based models. J. Comput. Biol. 7, 409-427 (2000).

Google

14. Joshi, A. in Natural Language Processing: Psycholinguistic, Computational and Theoretical Perspectives (eds Dowty, D., Karttunen, L. & Zwicky, A.) 206-250 (Chicago Univ. Press, New York, 1985).

Google

15. Uemura, Y., Hasegawa, A., Kobayashi, S. & Yokomori, T. Tree-adjoining grammars for RNA structure prediction. Theor. Comput. Sci. 10, 277-303 (1999).

Google

16. Searls, D. B. String Variable Grammar: a logic grammar formalism for DNA sequences. J. Logic Program. 24, 73-102 (1995).

Google

17. Rivas, E. & Eddy, S. R. The language of RNA: a formal grammar that includes pseudoknots. Bioinformatics 16, 334-340 (2000).

Google

18. Shieber, S. Evidence against the context-freeness of natural language. Linguist. Phil. 8, 333-343 (1985).

Google

19. Schultz, J., Milpetz, F., Bork, P. & Ponting, C. P. SMART, a simple modular architecture research tool: identification of signalling domains. Proc. Natl Acad. Sci. USA 95, 5857-5864 (1998).

Google

20. Westhead, D. R., Slidel, T. W., Flores, T. P. & Thornton, J. M. Protein structural topology: automated analysis and diagrammatic representation. Protein Sci. 8, 897-904 (1999).

Google

21. Abe, N. & Mamitsuka, H. Predicting protein secondary structure using stochastic tree grammars. Machine Learn. 29, 275-301 (1997).

Google

22. Przytycka, T., Srinivasan, R., & Rose, G. D. Recursive domains in proteins. Protein Sci. 11, 409-417 (2002).

Google

23. Jung, J. & Lee, B. Circularly permuted proteins in the protein structure database. Protein Sci. 10, 1881-1886 (2001).

Google

24. Hopcroft, J. E. & Ullman, J. D. Introduction to Automata Theory, Languages, and Computation (Addison-Wesley, Reading, MA, 1979).

Google

25. Searls, D. B. Reading the book of life. Bioinformatics 17, 579-580 (2001).

Google

26. Dong, S. & Searls, D. B. Gene structure prediction by linguistic methods. Genomics 23, 540-551 (1994).

Google

27. Searls, D. B. Linguistic approaches to biological sequences. Comput. Appl. Biosci. 13, 333-344 (1997).

Google

28. Collado-Vides, J. A transformational-grammar approach to the study of the regulation of gene expression. J. Theor. Biol. 136, 403-425 (1989).

Google

29. Rosenblueth, D. A. et al. Syntactic recognition of regulatory regions in Escherichia coli. Comput. Appl. Biosci. 12, 15-22 (1996).

Google

30. Leung, S., Mellish, C. & Robertson, D. Basic Gene Grammars and DNA-ChartParser for language processing of Escherichia coli promoter DNA sequences. Bioinformatics 17, 226-236 (2001).

Google

31. Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78-94 (1997).

Google

32. Reese, M. G., Kulp, D., Tammana, H. & Haussler, D. Genie--gene finding in Drosophila melanogaster. Genome Res. 10, 529-538 (2000).

Google

33. Yandell, M. D. & Majoros, W. H. Genomics and natural language processing. Nature Rev. Genet. 3, 601-610 (2002).

Google

34. Sakakibara, Y. et al. Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res. 22, 5112-5120 (1994).

Google

35. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955-964 (1997).

Google

36. Rivas, E. & Eddy, S. R. Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics 2, 8 (2001).

Google

37. Knudsen, B. & Hein, J. RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15, 446-454 (1999).

Google

38. Brown, M. P. Small subunit ribosomal RNA modeling using stochastic context-free grammars. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 57-66 (2000).

Google

39. Holmes, I. & Rubin, G. M. Pairwise RNA structure comparison with stochastic context-free grammars. Pac. Symp. Biocomput. 163-174 (2002).

Google

40. Brown M. & Wilson C. RNA pseudoknot modeling using intersections of stochastic context free grammars with applications to database search. Pac. Symp. Biocomput. 109-125 (1996).

Google

41. Campbell, L. Historical Linguistics: An Introduction (MIT Press, Cambridge, MA, 1999).

Google

42. Darwin, C. The Descent of Man (John Murray, London, 1871).

Google

43. Dawkins, R. The Selfish Gene (Oxford Univ. Press, Oxford, 1976).

Google

44. Nowak, M. A., Komarova, N. L. & Niyogi, P. Computational and evolutionary aspects of language. Nature 417, 611-617 (2002).

Google UIUC

45. Pennock, R. T. Tower of Babel: The Evidence against the New Creationism (Bradford/MIT Press, Cambridge, MA, 1999).

Google

46. Cavalli-Sforza, L. L. Genes, Peoples, and Languages (North Point Press, New York, 2000).

Google

47. Warnow T. Mathematical approaches to comparative linguistics. Proc. Natl Acad. Sci. USA 94, 6585-6590 (1997).

Google UIUC

48. Swadesh, M. Lexicostatistical dating of prehistoric ethnic contacts: with special reference to North American Indians and Eskimos. Proc. Am. Phil. Soc. 96, 452-463 (1952).

Google

49. Kruskal, J. B., Dyen, I. & Black, P. in Lexicostatistics in Genetic Linguistics (ed. Dyen, I.) 30-55 (Mouton, The Hague, 1973).

Google

50. Mushegian, A. The minimal genome concept. Curr. Opin. Genet. Dev. 9, 709-714 (1999).

Google

51. Tatusov, R. L., Galperin, M. Y., Natale, D. A. & Koonin, E. V. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28, 33-36 (2000).

Google

52. Snel, B., Bork P, & Huynen, M. A. Genome phylogeny based on gene content. Nature Genet. 21, 108-110 (1999).

Google

53. Tekaia, F., Lazcano, A., & Dujon, B. The genomic tree as revealed from whole proteome comparisons. Genome Res. 9, 550-557 (1999).

Google

54. Lin, J. & Gerstein, M. Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels. Genome Res. 10, 808-818 (2000).

Google

55. Pellegrini, M. et al. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96, 4285-4288 (1999).

Google

56. McWhorter, J. H. The Power of Babel: A Natural History of Language 128-129 (Freeman, New York, 2001).

Google

57. Searls, D. B. From Jabberwocky to genome: Lewis Carroll and computational biology. J. Comp. Biol. 8, 339-348 (2001).

Google

58. Lupas, A. N., Ponting, C. P. & Russell, R. B. On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world? J. Struct. Biol. 134, 191-203 (2001).

Google

59. McKeown, K. R. & Radev, D. R. in A Handbook of Natural Language Processing (eds Dale, R., Moisl, H. & Somers, H.) 507-523 (Dekker, New York, 2000).

Google

60. Marcotte, E. M. et al. Detecting protein function and protein-protein interactions from genome sequences. Science 285, 751-753 (1999).

Google

61. Smadja, F. Retrieving collocations from text: XTRACT. Comput. Linguist. 19, 143-177 (1993).

Google

62. Rudman, J. The state of authorship attribution studies: some problems and solutions. Comput. Humanities 31, 351-365 (1998).

Google

63. Barnbrook, G. Language and Computers (Edinburgh Univ. Press, Edinburgh, 1996).

Google

64. Zipf, G. K. Human Behavior and the Principle of Least Effort (Addison-Wesley, Boston, MA, 1949).

Google

65. Mandelbrot, B. The Fractal Geometry of Nature (Freeman, San Francisco, 1983).

Google

66. Mantegna, R. N. et al. Linguistic features of noncoding DNA sequences. Phys. Rev. Lett. 73, 3169-3172 (1994).

Google

67. Huynen, M. A. & van Nimwegen, E. The frequency distribution of gene family sizes in complete genomes. Mol. Biol. Evol. 15, 583-589 (1998).

Google

68. Harrison, P. M. & Gerstein, M. Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. J. Mol. Biol. 318, 1155-1174 (2002).

Google

69. Qian, J., Luscombe, N. M. & Gerstein, M. Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model. J. Mol. Biol. 313, 673-681 (2001).

Google

70. Schuster, P., Fontana, W., Stadler, P. F. & Hofacker, I. L. From sequences to shapes and back: a case study in RNA secondary structures. Proc. R. Soc. Lond. B 255, 279-284 (1994).

Google

71. Hoyle, D. C., Rattray, M., Jupp, R. & Brass, A. Making sense of microarray data distributions. Bioinformatics 18, 576-584 (2002).

Google

72. Rzhetsky, A. & Gomez, S. M. Birth of scale-free molecular networks and the number of distinct DNA and protein domains per genome. Bioinformatics 17, 988-996 (2001).

Google

73. Jeong, H. et al. The large-scale organization of metabolic networks. Nature 407, 651-654 (2000).

Google

74. Park, J., Lappe, M. & Teichmann, S. A. Mapping protein family interactions: intramolecular and intermolecular protein family interaction repertoires in the PDB and yeast. J. Mol. Biol. 307, 929-938 (2001).

Google

75. Garcia-Vallve, S., Romeu, A. & Palau, J. Horizontal gene transfer in bacterial and archaeal complete genomes. Genome Res. 10, 1719-1725 (2000).

Google

76. White, O. et al. A quality control algorithm for DNA sequencing projects. Nucleic Acids Res. 21, 3829-3838 (1993).

Google

77. Hoover, D. I. Statistical stylistics and authorship attribution: an empirical investigation. Lit. Linguist. Comput. 16, 421-444 (2001).

Google

78. Binongo, J. N. G. & Smith, M. W. A. The application of principal component analysis to stylometry. Lit. Linguist. Comput. 14, 445-466 (1999).

Google

79. Hoorn, J. F., Frank, S. L., Kowalczyk, W. & van der Ham, F. Neural network identification of poets using letter sequences. Lit. Linguist. Comput. 14, 311-338 (1999).

Google

80. Leopold, E. & Kindermann, J. Text categorization with support vector machines. How to represent texts in input space? Machine Learn. 46, 423-444 (2002).

Google

81. Holmes, D. I. & Forsyth, R. S. The Federalist revisited: new directions in authorship attribution. Lit. Linguist. Comput. 10, 111-127 (1995).

Google

82. Altman, R. B. & Raychaudhuri, S. Whole-genome expression analysis: challenges beyond clustering. Curr. Opin. Struct. Biol. 11, 340-347 (2001).

Google

83. Searls D. B. Mining the bibliome. Pharmacogenomics J. 1, 88-89 (2001).

Google

84. Popov, O., Segal, D. M. & Trifonov, E. N. Linguistic complexity of protein sequences as compared to texts of human languages. Biosystems 38, 65-74 (1996).

Google

85. Trifonov, E. N. Interfering contexts of regulatory sequence elements. Comput. Appl. Biosci. 12, 423-429 (1996).

Google

86. Spenser, M. & Howe, C. Estimating distances between manuscripts based on copying errors. Lit. Linguist. Comput. 16, 467-484 (2001).

Google

87. Barbrook, A. C., Howe, C. J., Blake, N. & Robinson, P. The phylogeny of the Canterbury Tales. Nature 394, 839 (1998).

Google

88. Platnick, N. I. & Cameron, H. D. Cladistic methods in textual, linguistic, and phylogenetic analysis. Syst. Zool. 26, 380-385 (1977).

Google

89. Tanselle, G. T. Literature and Artifacts (Bibliographical Society of the University of Virginia, Charlottesville, VA, 1998).

Google

90. Ferrer, D. Hypertextual representation of literary working papers. Lit. Linguist. Comput. 10, 143-145 (1995).

Google

 HOME   ::   Back to the Paper   ::   References Comments to: junwang4 you-know-at gmail.com Last update: 2/3/09