cogent3.core.seq_storage.AlignedSeqsData#
- class AlignedSeqsData(*, gapped_seqs: NumpyIntArrayType, names: Collection[str], alphabet: c3_alphabet.CharAlphabet[Any], ungapped_seqs: dict[str, NumpyIntArrayType] | None = None, gaps: Mapping[str, NumpyIntArrayType] | None = None, offset: dict[str, int] | None = None, align_len: int | None = None, check: bool = True, reversed_seqs: set[str] | None = None)#
The builtin
cogent3implementation of aligned sequences storage underlying anAlignment. Indexing this object returns anAlignedDataViewwhich can realise the corresponding slice as a string, bytes, or numpy array, gapped or ungapped.- Attributes:
align_lenReturn the length of the alignment.
alphabetthe character alphabet for validating, encoding, decoding sequences
namesreturns the names of the sequences in the storage
offsetreturns the offset of each sequence in the Alignment
reversed_seqsnames of sequences that are reverse complemented
Methods
add_seqs(seqs[, force_unique_keys, offset])Returns a new AlignedSeqsData object with added sequences.
copy(**kwargs)shallow copy of self
from_names_and_array(*, names, data, alphabet)Construct an AlignedSeqsData object from a list of names and a numpy array of aligned sequence data.
from_seqs(*, data, alphabet, **kwargs)Construct an AlignedSeqsData object from a dict of aligned sequences
from_seqs_and_gaps(*, seqs, gaps, alphabet, ...)Construct an AlignedSeqsData object from a dict of ungapped sequences and a corresponding dict of gap data.
get_gapped_seq_array(*, seqid[, start, ...])Return sequence data corresponding to seqid as an array of indices.
get_gapped_seq_bytes(*, seqid[, start, ...])Return sequence corresponding to seqid as a bytes string.
get_gapped_seq_str(*, seqid[, start, stop, step])Return sequence corresponding to seqid as a string.
get_gaps(seqid)returns the gap data for seqid
get_hash(seqid)returns hash of seqid
get_pos_range(names[, start, stop, step])returns an array of the selected positions for names.
get_positions(names, positions)returns alignment positions for names
get_seq_array(*, seqid[, start, stop, step])Return ungapped sequence corresponding to seqid as an array of indices.
get_seq_bytes(*, seqid[, start, stop, step])Return ungapped sequence corresponding to seqid as a bytes string.
get_seq_length(seqid)return length of the unaligned seq for seqid
get_seq_str(*, seqid[, start, stop, step])Return ungapped sequence corresponding to seqid as a string.
get_ungapped(name_map[, start, stop, step])Returns a dictionary of sequence data with no gaps or missing characters and a dictionary with information to construct a new SequenceCollection via make_unaligned_seqs.
get_view(seqid[, slice_record])reurns view of aligned sequence data for seqid
to_alphabet(alphabet[, check_valid])Returns a new AlignedSeqsData object with the same underlying data with a new alphabet.
variable_positions(names[, start, stop, step])returns absolute indices of positions that have more than one state
Notes
Methods on this object only accepts plust strand start, stop and step indices for selecting segments of data. It can return the gap coordinates for a sequence as used by
IndelMap.