Skip to content

Conversation

@danielle-pinto
Copy link
Collaborator

@danielle-pinto danielle-pinto commented Jan 27, 2026

Tutorial for Hamming Distance problem

I added two different ways to solve the problem: with a function I wrote, as well as a function in the BioAlignments package. Please let me know if there's anything else that would be good to add to this tutorial!

@github-actions
Copy link

Once the build has completed, you can preview your PR at this URL: https://biojulia.dev/BiojuliaDocs/previews/PR15/

Let's give this a try!

```julia
SampleSeqA = "GAGCCTACTAACGGGAT"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to find a source of truth for variable naming conventions in Julia, so I'm using camel case here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

```


To calculate the Hamming Distance between two strings/sequences, the two strings/DNA sequences must be the same length. We can calculate the Hamming Distance by looping over the characters in one of the strings and checking if the corresponding character at the same index in the other string matches. Each mismatch will cause 1 to be added to a `counter` variable. At the end of the loop, we can return the total value of the `counter` variable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use semantic line breaks (sembr.org)

Let's give this a try!

```julia
SampleSeqA = "GAGCCTACTAACGGGAT"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SampleSeqA = "GAGCCTACTAACGGGAT"
seq_a = "GAGCCTACTAACGGGAT"

Julia convention is to use sentence case only for types and modules, and typically short, all-lowercase variable names. For really long things, or where you really need word separation, snake_case.

SeqLength = length(SeqA)

# check if the strings are empty
if SeqLength == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if SeqLength == 0
if isempty(seq_a)


# check if the strings are empty
if SeqLength == 0
return 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works in the context of a shell script, but you need to explicitly throw the error (error("empty sequences") or throw(ErrorException("empty sequences")))

SampleSeqA = "GAGCCTACTAACGGGAT"
SampleSeqB = "CATCGTAATGACGGCCT"

function calcHamming(SeqA, SeqB)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function calcHamming(SeqA, SeqB)
function hamming(SeqA, SeqB)

Same naming convention for functions as for variables, and things like calc are redundant

SampleSeqB = "CATCGTAATGACGGCCT"

function calcHamming(SeqA, SeqB)
SeqLength = length(SeqA)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't really need this. Could maybe also check that seq_a and _b she the same length and throw an error if not

end

mismatches = 0
for i in 1:SeqLength
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for i in 1:SeqLength
for i in each index(seq_a)

An alternative to this loop that's a bit more idiomatic/functional would be something like count(i-> seq_a[i] != seq_b[i], eachindex(seq_a))

In a tutorial, often nice to show multiple approaches though


BioAlignmentsHamming = BioAlignments.hamming_distance(Int64, "GAGCCTACTAACGGGAT", "CATCGTAATGACGGCCT")

BioAlignmentsHamming[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BioAlignmentsHamming[1]
bio_hamming[1]

Let's give this a try!

```julia
SampleSeqA = "GAGCCTACTAACGGGAT"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


```julia
# Double check that we got the same values from both ouputs
@assert calcHamming(SampleSeqA, SampleSeqB) == BioAlignmentsHamming[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also show the hamming distances from https://github.com/JuliaStats/Distances.jl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants