Data download terms and conditions

We encourage anyone to download and use the data provided here. If you do so, please do get in touch with us, we are happy to provide guidance in navigating this data.

Citation

Please cite the following paper if you publish any analysis involving the data deposited here

Phenotype inference in an Escherichia coli strain panel
eLife 2017;6:e31035; doi: 10.7554/eLife.31035

Ecoref bulk data download (release 2, 2022/11/29)

For now only the updated annotated assemblies are provided. The rest of the data will follow, also for the content of the "strains" and "variants" pages.

Old releases

How to read the pangenome table

The python Pandas library makes it relatively simple to load this data. You might want to look into roary_plots to generate more complex plots.

import pandas as pd
import numpy as np

# Load roary
roary = pd.read_table('pangenome.csv',
                      sep=',',
                      low_memory=False)
# Set index (group name)
roary.set_index('Gene', inplace=True)
# Drop the other info columns
roary.drop(list(roary.columns[:13]), axis=1, inplace=True)

# Transform it in a presence/absence matrix (1/0)
roary.replace('.{2,100}', 1, regex=True, inplace=True)
roary.replace(np.nan, 0, regex=True, inplace=True)