Hamming distance #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

danielle-pinto wants to merge 2 commits into main from 2026-01-27-hamming-distance

Collaborator

danielle-pinto commented Jan 27, 2026 •

edited

Loading

Tutorial for Hamming Distance problem

I added two different ways to solve the problem: with a function I wrote, as well as a function in the BioAlignments package. Please let me know if there's anything else that would be good to add to this tutorial!

danielle-pinto added 2 commits

January 27, 2026 11:19


          outline of hamming distance tutorial

d62948c


          slight text edits

7e2b404

danielle-pinto requested a review from kescobo

January 27, 2026 21:23

github-actions bot commented Jan 27, 2026

Once the build has completed, you can preview your PR at this URL: https://biojulia.dev/BiojuliaDocs/previews/PR15/

danielle-pinto commented

View reviewed changes

docs/src/rosalind/06-hamming.md

+              Let's give this a try!
+              ```julia
+              SampleSeqA = "GAGCCTACTAACGGGAT"

Collaborator Author

danielle-pinto Jan 27, 2026

I wasn't able to find a source of truth for variable naming conventions in Julia, so I'm using camel case here.

Member

kescobo Jan 28, 2026

https://docs.julialang.org/en/v1/manual/style-guide/index.html is the official one, there are a couple of unofficial ones

kescobo requested changes

View reviewed changes

docs/src/rosalind/06-hamming.md

		```


		To calculate the Hamming Distance between two strings/sequences, the two strings/DNA sequences must be the same length. We can calculate the Hamming Distance by looping over the characters in one of the strings and checking if the corresponding character at the same index in the other string matches. Each mismatch will cause 1 to be added to a `counter` variable. At the end of the loop, we can return the total value of the `counter` variable.

Member

kescobo Jan 27, 2026

Use semantic line breaks (sembr.org)

docs/src/rosalind/06-hamming.md

+              Let's give this a try!
+              ```julia
+              SampleSeqA = "GAGCCTACTAACGGGAT"

Member

kescobo Jan 27, 2026

Suggested change

      
            SampleSeqA = "GAGCCTACTAACGGGAT"
          
            seq_a = "GAGCCTACTAACGGGAT"

Julia convention is to use sentence case only for types and modules, and typically short, all-lowercase variable names. For really long things, or where you really need word separation, snake_case.

docs/src/rosalind/06-hamming.md

+                  SeqLength = length(SeqA)
+                  # check if the strings are empty
+                  if SeqLength == 0

Member

kescobo Jan 27, 2026

Suggested change

      
                if SeqLength == 0
          
                if isempty(seq_a)

docs/src/rosalind/06-hamming.md

+                  # check if the strings are empty
+                  if SeqLength == 0
+                      return 0

Member

kescobo Jan 27, 2026

This works in the context of a shell script, but you need to explicitly throw the error (error("empty sequences") or throw(ErrorException("empty sequences")))

docs/src/rosalind/06-hamming.md

+              SampleSeqA = "GAGCCTACTAACGGGAT"
+              SampleSeqB = "CATCGTAATGACGGCCT"
+              function calcHamming(SeqA, SeqB)

Member

kescobo Jan 27, 2026

Suggested change

      
            function calcHamming(SeqA, SeqB)
          
            function hamming(SeqA, SeqB)

Same naming convention for functions as for variables, and things like calc are redundant

docs/src/rosalind/06-hamming.md

+              SampleSeqB = "CATCGTAATGACGGCCT"
+              function calcHamming(SeqA, SeqB)
+                  SeqLength = length(SeqA)

Member

kescobo Jan 27, 2026

Didn't really need this. Could maybe also check that seq_a and _b she the same length and throw an error if not

docs/src/rosalind/06-hamming.md

+                  end
+                  mismatches = 0
+                  for i in 1:SeqLength

Member

kescobo Jan 27, 2026

Suggested change

      
                for i in 1:SeqLength
          
                for i in each index(seq_a)

An alternative to this loop that's a bit more idiomatic/functional would be something like count(i-> seq_a[i] != seq_b[i], eachindex(seq_a))

In a tutorial, often nice to show multiple approaches though

docs/src/rosalind/06-hamming.md


		BioAlignmentsHamming = BioAlignments.hamming_distance(Int64, "GAGCCTACTAACGGGAT", "CATCGTAATGACGGCCT")

		BioAlignmentsHamming[1]

Member

kescobo Jan 27, 2026

Suggested change

      
            BioAlignmentsHamming[1]
          
            bio_hamming[1]

docs/src/rosalind/06-hamming.md

+              Let's give this a try!
+              ```julia
+              SampleSeqA = "GAGCCTACTAACGGGAT"

Member

kescobo Jan 28, 2026

https://docs.julialang.org/en/v1/manual/style-guide/index.html is the official one, there are a couple of unofficial ones

docs/src/rosalind/06-hamming.md

+              ```julia
+              # Double check that we got the same values from both ouputs
+              @assert calcHamming(SampleSeqA, SampleSeqB) == BioAlignmentsHamming[1]

Member

kescobo Jan 28, 2026

Maybe also show the hamming distances from https://github.com/JuliaStats/Distances.jl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet