Loading aligned sequence data#

We can load aligned sequence data using the load_aligned app. When making the app, you can optionally provide arguments for the molecular type of the sequence and the format of the data.

Loading aligned DNA sequences from a single fasta file#

Here we load the brca1 gene in bats, providing the molecular type (moltype="dna") and file format (format_name="fasta").

Loading aligned protein sequences from a single phylip file#

Here we load a globin alignment, providing the molecular type (moltype="protein") and file format (format_name="phylip").

Loading aligned DNA sequences from multiple fasta files#

In the above examples, the result is a single alignment, which could have been achieved using standard cogent3 (load_aligned_seqs()). The real power of apps is for batch processing of a large number of files.

To apply apps to multiple files we need to set two things up:

1. A data store that identifies the files we are interested in#

Here, we create a data store containing all the files with the “.fasta” suffix in the data directory, limiting the data store to two members as a minimum example.

2. A composed process that defines our workflow#

In this example, our process loads the sequences, filters the sequences to keep only the third codon position, and then writes the filtered sequences to a data store.

Note

Apps that are “writers” require a data store to write to, learn more about writers here!

Tip

When running this code on your machine, remember to replace path_to_dir with an actual directory path.

Now we’re good to go, we can apply process to our data store!#

result is a data store, which you can index to see individual data members - which are our alignments. We can take a closer look using the .read() method on data members (truncating to 50 characters).