-
Notifications
You must be signed in to change notification settings - Fork 8
Hamming distance #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Hamming distance #15
Conversation
|
Once the build has completed, you can preview your PR at this URL: https://biojulia.dev/BiojuliaDocs/previews/PR15/ |
| Let's give this a try! | ||
|
|
||
| ```julia | ||
| SampleSeqA = "GAGCCTACTAACGGGAT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to find a source of truth for variable naming conventions in Julia, so I'm using camel case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.julialang.org/en/v1/manual/style-guide/index.html is the official one, there are a couple of unofficial ones
| ``` | ||
|
|
||
|
|
||
| To calculate the Hamming Distance between two strings/sequences, the two strings/DNA sequences must be the same length. We can calculate the Hamming Distance by looping over the characters in one of the strings and checking if the corresponding character at the same index in the other string matches. Each mismatch will cause 1 to be added to a `counter` variable. At the end of the loop, we can return the total value of the `counter` variable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use semantic line breaks (sembr.org)
| Let's give this a try! | ||
|
|
||
| ```julia | ||
| SampleSeqA = "GAGCCTACTAACGGGAT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| SampleSeqA = "GAGCCTACTAACGGGAT" | |
| seq_a = "GAGCCTACTAACGGGAT" |
Julia convention is to use sentence case only for types and modules, and typically short, all-lowercase variable names. For really long things, or where you really need word separation, snake_case.
| SeqLength = length(SeqA) | ||
|
|
||
| # check if the strings are empty | ||
| if SeqLength == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if SeqLength == 0 | |
| if isempty(seq_a) |
|
|
||
| # check if the strings are empty | ||
| if SeqLength == 0 | ||
| return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works in the context of a shell script, but you need to explicitly throw the error (error("empty sequences") or throw(ErrorException("empty sequences")))
| SampleSeqA = "GAGCCTACTAACGGGAT" | ||
| SampleSeqB = "CATCGTAATGACGGCCT" | ||
|
|
||
| function calcHamming(SeqA, SeqB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| function calcHamming(SeqA, SeqB) | |
| function hamming(SeqA, SeqB) |
Same naming convention for functions as for variables, and things like calc are redundant
| SampleSeqB = "CATCGTAATGACGGCCT" | ||
|
|
||
| function calcHamming(SeqA, SeqB) | ||
| SeqLength = length(SeqA) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't really need this. Could maybe also check that seq_a and _b she the same length and throw an error if not
| end | ||
|
|
||
| mismatches = 0 | ||
| for i in 1:SeqLength |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for i in 1:SeqLength | |
| for i in each index(seq_a) |
An alternative to this loop that's a bit more idiomatic/functional would be something like count(i-> seq_a[i] != seq_b[i], eachindex(seq_a))
In a tutorial, often nice to show multiple approaches though
|
|
||
| BioAlignmentsHamming = BioAlignments.hamming_distance(Int64, "GAGCCTACTAACGGGAT", "CATCGTAATGACGGCCT") | ||
|
|
||
| BioAlignmentsHamming[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| BioAlignmentsHamming[1] | |
| bio_hamming[1] |
| Let's give this a try! | ||
|
|
||
| ```julia | ||
| SampleSeqA = "GAGCCTACTAACGGGAT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.julialang.org/en/v1/manual/style-guide/index.html is the official one, there are a couple of unofficial ones
|
|
||
| ```julia | ||
| # Double check that we got the same values from both ouputs | ||
| @assert calcHamming(SampleSeqA, SampleSeqB) == BioAlignmentsHamming[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also show the hamming distances from https://github.com/JuliaStats/Distances.jl
Tutorial for Hamming Distance problem
I added two different ways to solve the problem: with a function I wrote, as well as a function in the BioAlignments package. Please let me know if there's anything else that would be good to add to this tutorial!