Ecoref: Escherichia coli genetic reference panel

Data download terms and conditions

We encourage anyone to download and use the data provided here. If you do so, please do get in touch with us, we are happy to provide guidance in navigating this data.

Citation

Please cite the following paper if you publish any analysis involving the data deposited here

Phenotype inference in an Escherichia coli strain panel
eLife 2017;6:e31035; doi: 10.7554/eLife.31035

Ecoref bulk data download (release 2, 2022/11/29)

For now only the updated annotated assemblies are provided. The rest of the data will follow, also for the content of the "strains" and "variants" pages.

Conditions details (TSV format)
Growth phenotypes
All annotated assemblies (in GFF3 format)

Old releases

Release 1

Strains details (TSV format)
Conditions details (TSV format)
Growth phenotypes
SNPs (in VCF format)
SIFT scores for all the nonsynonymous substitutions observed in the strains collection
FoldX scores for all the nonsynonymous substitutions observed in the strains collection
Gene disruption scores for all K-12 coding genes in all strains
Pangenome (computed using Roary)
Strains tree (computed using Parsnp)
All annotated assemblies (in GFF3 format, broken link)
All proteomes (in fasta format, broken link)

How to read the pangenome table

The python Pandas library makes it relatively simple to load this data. You might want to look into roary_plots to generate more complex plots.

import pandas as pd
import numpy as np

# Load roary
roary = pd.read_table('pangenome.csv',
                      sep=',',
                      low_memory=False)
# Set index (group name)
roary.set_index('Gene', inplace=True)
# Drop the other info columns
roary.drop(list(roary.columns[:13]), axis=1, inplace=True)

# Transform it in a presence/absence matrix (1/0)
roary.replace('.{2,100}', 1, regex=True, inplace=True)
roary.replace(np.nan, 0, regex=True, inplace=True)