Strings and Sequences in Computer Science

Strings and Sequences in Computer Science

Bioinformatics Algorithms (Fundamental Algorithms, module 2) Zsuzsanna Lipt´ak Masters in Medical Bioinformatics academic year 2018/19, II semester Strings and Sequences in Computer Science • its elements are called characters or letters • jΣj is the size of the alphabet (number of different characters) • a string over Σ is a finite sequence of characters from Σ • we write strings as s = s1s2 ::: sn i.e. si is the i'th character of s N.B.: We number strings from 1, not from 0 Some formalism on strings • Σ a finite set called alphabet 2 / 7 • jΣj is the size of the alphabet (number of different characters) • a string over Σ is a finite sequence of characters from Σ • we write strings as s = s1s2 ::: sn i.e. si is the i'th character of s N.B.: We number strings from 1, not from 0 Some formalism on strings • Σ a finite set called alphabet • its elements are called characters or letters 2 / 7 • a string over Σ is a finite sequence of characters from Σ • we write strings as s = s1s2 ::: sn i.e. si is the i'th character of s N.B.: We number strings from 1, not from 0 Some formalism on strings • Σ a finite set called alphabet • its elements are called characters or letters • jΣj is the size of the alphabet (number of different characters) 2 / 7 • we write strings as s = s1s2 ::: sn i.e. si is the i'th character of s N.B.: We number strings from 1, not from 0 Some formalism on strings • Σ a finite set called alphabet • its elements are called characters or letters • jΣj is the size of the alphabet (number of different characters) • a string over Σ is a finite sequence of characters from Σ 2 / 7 N.B.: We number strings from 1, not from 0 Some formalism on strings • Σ a finite set called alphabet • its elements are called characters or letters • jΣj is the size of the alphabet (number of different characters) • a string over Σ is a finite sequence of characters from Σ • we write strings as s = s1s2 ::: sn i.e. si is the i'th character of s 2 / 7 Some formalism on strings • Σ a finite set called alphabet • its elements are called characters or letters • jΣj is the size of the alphabet (number of different characters) • a string over Σ is a finite sequence of characters from Σ • we write strings as s = s1s2 ::: sn i.e. si is the i'th character of s N.B.: We number strings from 1, not from 0 2 / 7 • is the empty string, the (unique) string of length 0 • Σn is the set of strings of length n ∗ S1 n 0 1 2 • Σ = n=0 Σ = Σ [ Σ [ Σ [ ::: is the set of all strings over Σ Some formalism on strings (cont.) • jsj is the length of string s 3 / 7 • Σn is the set of strings of length n ∗ S1 n 0 1 2 • Σ = n=0 Σ = Σ [ Σ [ Σ [ ::: is the set of all strings over Σ Some formalism on strings (cont.) • jsj is the length of string s • is the empty string, the (unique) string of length 0 3 / 7 ∗ S1 n 0 1 2 • Σ = n=0 Σ = Σ [ Σ [ Σ [ ::: is the set of all strings over Σ Some formalism on strings (cont.) • jsj is the length of string s • is the empty string, the (unique) string of length 0 • Σn is the set of strings of length n 3 / 7 = Σ0 [ Σ1 [ Σ2 [ ::: is the set of all strings over Σ Some formalism on strings (cont.) • jsj is the length of string s • is the empty string, the (unique) string of length 0 • Σn is the set of strings of length n ∗ S1 n • Σ = n=0 Σ 3 / 7 Some formalism on strings (cont.) • jsj is the length of string s • is the empty string, the (unique) string of length 0 • Σn is the set of strings of length n ∗ S1 n 0 1 2 • Σ = n=0 Σ = Σ [ Σ [ Σ [ ::: is the set of all strings over Σ 3 / 7 • RNA: Σ = fA,C,G,Ug, again alphabet size is 4 • protein: Σ = fA,C,D,E,F,. ,W,Yg, alphabet size is 20, ANRFYWNL is a string over Σ of length 8 • English alphabet: Σ = fa,b,c,. ,x,y,zg of size 26, alphabet is a string over Σ of length 8 Some formalism on strings: Examples Examples • DNA: Σ = fA,C,G,Tg, alphabet size jΣj = 4, s = ACCTG is a string of length 5 of Σ, with s1 = A; s2 = s3 = C; s4 = T; s5 = G. 4 / 7 • protein: Σ = fA,C,D,E,F,. ,W,Yg, alphabet size is 20, ANRFYWNL is a string over Σ of length 8 • English alphabet: Σ = fa,b,c,. ,x,y,zg of size 26, alphabet is a string over Σ of length 8 Some formalism on strings: Examples Examples • DNA: Σ = fA,C,G,Tg, alphabet size jΣj = 4, s = ACCTG is a string of length 5 of Σ, with s1 = A; s2 = s3 = C; s4 = T; s5 = G. • RNA: Σ = fA,C,G,Ug, again alphabet size is 4 4 / 7 • English alphabet: Σ = fa,b,c,. ,x,y,zg of size 26, alphabet is a string over Σ of length 8 Some formalism on strings: Examples Examples • DNA: Σ = fA,C,G,Tg, alphabet size jΣj = 4, s = ACCTG is a string of length 5 of Σ, with s1 = A; s2 = s3 = C; s4 = T; s5 = G. • RNA: Σ = fA,C,G,Ug, again alphabet size is 4 • protein: Σ = fA,C,D,E,F,. ,W,Yg, alphabet size is 20, ANRFYWNL is a string over Σ of length 8 4 / 7 Some formalism on strings: Examples Examples • DNA: Σ = fA,C,G,Tg, alphabet size jΣj = 4, s = ACCTG is a string of length 5 of Σ, with s1 = A; s2 = s3 = C; s4 = T; s5 = G. • RNA: Σ = fA,C,G,Ug, again alphabet size is 4 • protein: Σ = fA,C,D,E,F,. ,W,Yg, alphabet size is 20, ANRFYWNL is a string over Σ of length 8 • English alphabet: Σ = fa,b,c,. ,x,y,zg of size 26, alphabet is a string over Σ of length 8 4 / 7 • t is a substring of s if t = or t = si ::: sj for some 1 ≤ i ≤ j ≤ n (i.e., a "contiguous piece" of s) CCT; AC;::: • t is a prefix of s if t = or t = s1 ::: sj for some 1 ≤ j ≤ n (i.e., a "beginning" of s) AC; ACCTG;::: • t is a suffix of s if t = or t = si ::: sn for some 1 ≤ i ≤ n (i.e., an "end" of s) CCTG; G;::: • t is a subsequence of s if t can be obtained from s by deleting some (possibly 0, possibly all) characters from s AT; CCT;::: N.B. string = sequence, but substring 6= subsequence! Some formalism on strings Let s = s1 ::: sn be a string over Σ. ex. s = ACCTG 5 / 7 CCT; AC;::: • t is a prefix of s if t = or t = s1 ::: sj for some 1 ≤ j ≤ n (i.e., a "beginning" of s) AC; ACCTG;::: • t is a suffix of s if t = or t = si ::: sn for some 1 ≤ i ≤ n (i.e., an "end" of s) CCTG; G;::: • t is a subsequence of s if t can be obtained from s by deleting some (possibly 0, possibly all) characters from s AT; CCT;::: N.B. string = sequence, but substring 6= subsequence! Some formalism on strings Let s = s1 ::: sn be a string over Σ. ex. s = ACCTG • t is a substring of s if t = or t = si ::: sj for some 1 ≤ i ≤ j ≤ n (i.e., a "contiguous piece" of s) 5 / 7 • t is a prefix of s if t = or t = s1 ::: sj for some 1 ≤ j ≤ n (i.e., a "beginning" of s) AC; ACCTG;::: • t is a suffix of s if t = or t = si ::: sn for some 1 ≤ i ≤ n (i.e., an "end" of s) CCTG; G;::: • t is a subsequence of s if t can be obtained from s by deleting some (possibly 0, possibly all) characters from s AT; CCT;::: N.B. string = sequence, but substring 6= subsequence! Some formalism on strings Let s = s1 ::: sn be a string over Σ. ex. s = ACCTG • t is a substring of s if t = or t = si ::: sj for some 1 ≤ i ≤ j ≤ n (i.e., a "contiguous piece" of s) CCT; AC;::: 5 / 7 AC; ACCTG;::: • t is a suffix of s if t = or t = si ::: sn for some 1 ≤ i ≤ n (i.e., an "end" of s) CCTG; G;::: • t is a subsequence of s if t can be obtained from s by deleting some (possibly 0, possibly all) characters from s AT; CCT;::: N.B. string = sequence, but substring 6= subsequence! Some formalism on strings Let s = s1 ::: sn be a string over Σ. ex. s = ACCTG • t is a substring of s if t = or t = si ::: sj for some 1 ≤ i ≤ j ≤ n (i.e., a "contiguous piece" of s) CCT; AC;::: • t is a prefix of s if t = or t = s1 ::: sj for some 1 ≤ j ≤ n (i.e., a "beginning" of s) 5 / 7 • t is a suffix of s if t = or t = si ::: sn for some 1 ≤ i ≤ n (i.e., an "end" of s) CCTG; G;::: • t is a subsequence of s if t can be obtained from s by deleting some (possibly 0, possibly all) characters from s AT; CCT;::: N.B.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    31 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us