Skip to content

2026 03 04 cookbook sequences#28

Open
danielle-pinto wants to merge 3 commits intomainfrom
2026-03-04-cookbook-sequences
Open

2026 03 04 cookbook sequences#28
danielle-pinto wants to merge 3 commits intomainfrom
2026-03-04-cookbook-sequences

Conversation

@danielle-pinto
Copy link
Collaborator

First cookbook tutorial that explains how to read in different file bioinformatics file types

@danielle-pinto danielle-pinto requested a review from kescobo March 6, 2026 20:30
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought this file would be alright to add to Git since it is tiny. But I can also just have the user curl it themselves.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seems fine

@github-actions
Copy link

github-actions bot commented Mar 6, 2026

All of the nucleotides in all of the reads have a quality score of `$`, which corresponds to a probabilty of error of 0.50119.
More information about how to convert ASCII values to quality scores [here](https://people.duke.edu/~ccc14/duke-hts-2018/bioinformatics/quality_scores.html).
This would be quite poor if we were looking at Illumia data.
However, because of how PacBio chemistry works,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to confirm this information/explanation with you

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never used PacBio data before!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this is right, though maybe just grab an illumina dataset instead (or something from FormatSpecimens) so as not to need this particular bit - no need to over complicate things

The SRR (sample run accession number) is the unique identifier within SRA
and corresponds to the specific sequencing run.

In a later tutorial, we will discuss how to download this file in Julia using the SRR.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any example code that can be shared on how to do this? Or I can show how this package can be used in a one line on the terminal here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/BioJulia/BioServices.jl is the cannonical way.

But also, another useful addition to the cookbook would be showing how to call shell commands from julia

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seems fine


This cookbook will provide a series of "recipes" that will help get started quickly with BioJulia so you can doing some bioinformatics!

We have tutorials for reading in files, performing alignments, and using tools such as BLAST,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will have, no

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though another option would be to bring in FormatSpecimens.jl... maybe not for the very first one.

The SRR (sample run accession number) is the unique identifier within SRA
and corresponds to the specific sequencing run.

In a later tutorial, we will discuss how to download this file in Julia using the SRR.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/BioJulia/BioServices.jl is the cannonical way.

But also, another useful addition to the cookbook would be showing how to call shell commands from julia

Comment on lines +134 to +137
```
curl -L --retry 5 --retry-delay 2 \
"https://trace.ncbi.nlm.nih.gov/Traces/sra-reads-be/fastq?acc=SRR12147540" \
| gzip -c > SRR12147540.fastq.gz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: command line - this can be

run(pipeline(
    `curl -L --retry 5 --retry-delay 2 "https://trace.ncbi.nlm.nih.gov/Traces/sra-reads-be/fastq?acc=SRR12147540"`,
    `gzip -c`,
    "SRR12147540.fastq.gz"
    )
)

or

run(pipeline(
    `curl -L --retry 5 --retry-delay 2 "https://trace.ncbi.nlm.nih.gov/Traces/sra-reads-be/fastq?acc=SRR12147540"`;
    stdout=pipeline(`gzip -c`; stdout="SRR12147540.fastq.gz")
    )
)

All of the nucleotides in all of the reads have a quality score of `$`, which corresponds to a probabilty of error of 0.50119.
More information about how to convert ASCII values to quality scores [here](https://people.duke.edu/~ccc14/duke-hts-2018/bioinformatics/quality_scores.html).
This would be quite poor if we were looking at Illumia data.
However, because of how PacBio chemistry works,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this is right, though maybe just grab an illumina dataset instead (or something from FormatSpecimens) so as not to need this particular bit - no need to over complicate things

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants