How to Convert a TXT File to FASTA
By Maureen Bruen
Updated September 28, 2017
Clinical studies are performed to analyze protein sequence data and find treatments for illnesses. Protein sequence data is put in the FASTA (fast-all) format so that software programs understand how to process the data sequence. The FASTA format has up to 80 characters per sequence data line and uses the IUB/IUPAC (International Union of Biochemistry/International Union of Pure and Applied Chemistry) code standard. Converting a TXT (plain text) file to FASTA format involves editing or adding FASTA-formatted sequence data to an existing text file with protein sequence data lines. Text editor programs like Notepad make this simple to do.
Open the protein sequence text file you want to edit in a text editing program such as Notepad.
Edit or add the description line to follow the FASTA format. For example, >gi|129295|sp|P01013|OVAX_CHICK GENE X PROTEIN (OVALBUMIN-RELATED) is a valid FASTA description line. This line provides a unique description for the sequence data lines that follow. The FASTA format requires the use of the greater than symbol (>) so the software program can identify the unique descriptive information and avoid processing the description as a protein data sequence line.
Press the "Enter" key to insert a line break once the description line is edited.
Edit or add the protein sequence data line format to conform to the IUB/IUPAC standard codes. The IUB/IUPAC standard uses the letters of the alphabet to represent acceptable codes or query sequences for amino acids or nucleic acids in the FASTA format. For example, QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMMCMNNSFNVATLPAE represents one line of valid sequence data since it starts with the letter "Q," representing glutamine, and ends with the letter "E," representing glutamate.
Add more sequence data lines, edit existing sequence data lines or add line breaks after 80 characters as needed. Adhering to the FASTA sequence data line standards and line breaks ensures that the program follows the instructions related to glutamine, glutamate and other letter codes. The letters in the IUB/IUPAC standard are simply instructions to the software program that processes FASTA formatted data.
Click "File," select "Save" then click the "Save" button. Your TXT file is now in FASTA format.
Maureen Bruen is a graduate of Williams College with a bachelor's degree in art history and computer science. She has been writing, programming, designing and doing photography for corporations and local governments since 1999. She started publishing technical manuals for software companies using SQL (Structured Query Language) in 1991.