Mountain View
biomedical and healthCAre Data Discovery Index Ecosystem
help Advanced Search
Title: Supporting data for "An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing"      
dateReleased:
12-28-2016
privacy:
not applicable
aggregation:
instance of dataset
dateCreated:
12-28-2016
refinement:
curated
ID:
doi:10.5524/100268
creators:
Zimin, Aleksey V
Stevens, Kristian A
Crepeau, Marc W
Puiu, Daniela
Wegrzyn, Jill L
Yorke, James A
Langley, Charles H
Neale, David B
Salzberg, Steven L
availability:
available
types:
sequence
description:
The 22 gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs whose weighted average (N50) size was 8,206 bp. To improve this result, we generated approximately 12-fold coverage in long reads using the Single Molecule Real Time (SMRT) sequencing technology developed at Pacific Biosciences. We assembled the long and short reads together using the MaSuRCA mega-reads assembly algorithm, which produced a substantially better assembly, P. taeda version 2.0. The new assembly has an N50 contig size of 25,361, more than three times as large as achieved in the original assembly, and an N50 scaffold size of 107,821, 61% larger than the previous assembly.
accessURL: https://doi.org/10.5524/100268
storedIn:
GigaScience Database
qualifier:
not compressed
format:
HTML
accessType:
landing page
primary:
true
authentication:
none
authorization:
none
abbreviation:
GigaDB
homePage: http://gigadb.org/
ID:
SCR:006565
name:
Giga Science Database

Feedback?

If you are having problems using our tools, or if you would just like to send us some feedback, please post your questions on GitHub.