Genovo version 0.4 In short: ======== % Assemble the reads in all_reads.fa. Run for 40 iterations. % Output all contigs with length > 500b to file genovo.fa assemble all_reads.fa 40 finalize 500 genovo.fa all_reads.fa.dump.best - Check and run the commented demo script DEMO.sh. Main program: assemble ====================== - assemble will run Genovo for 10,000 iterations on the set of reads in . - assemble N will run Genovo for N iterations. - assemble N will run Genovo for N iterations, loading the initial state from . Inputs: is a fasta file of reads. is the .dump or .dump.best file generated as output by a previous Genovo run. Outputs: The output files are updated during the course of the algorithm. It is ok to view them while the algorithm is running. It is ok to kill the algorithm in which case the output files will represent the most updated results. The output files are: .status - statistics about the current run. A line is written after every iteraiton. For details see "status file" below. .dump - last state achieved. Updated after every 10 whole iterations. .dump.best - best state achieved. Update after every 10 whole iterations. Generate a list of contigs from a dump file =========================================== - finalize will output to all the contig sequences in that have length greater than . Example: finalize 0 output.fa all_reads.fa.dump.best Compute Score_{denovo}, which scores an assembly ================================================ - compute_score_denovo Example output: ==================================================================================== * read file: synthetic/all_reads.fa contig file: synthetic/genovo.fa * no. reads: 17946 Percent garbage: 1% (222 reads) * no. contigs: 9 total contig length: 68864 * alignment score: -306305 garbage score: -37916.6 * Score: 0.984671 (raw score: -433282 ) ==================================================================================== 'garbage' refers to reads that did not map to any of the contig in . alignment_score: component in likelihood score achieved from the mapped reads garbage_score: component in likelihood score achieved from the unmapped reads Score is the normalized score_{denovo}, as defined in the Genovo paper. Raw score is the unnormalized score. CONTACT ======= jonil@stanford.edu