We encourage anyone to download and use the data provided here. If you do so, please do get in touch with us, we are happy to provide guidance in navigating this data.
Please cite the following paper if you publish any analysis involving the data deposited here
For now only the updated annotated assemblies are provided. The rest of the data will follow, also for the content of the "strains" and "variants" pages.
The python Pandas library makes it relatively simple to load this data. You might want to look into roary_plots to generate more complex plots.
import pandas as pd import numpy as np # Load roary roary = pd.read_table('pangenome.csv', sep=',', low_memory=False) # Set index (group name) roary.set_index('Gene', inplace=True) # Drop the other info columns roary.drop(list(roary.columns[:13]), axis=1, inplace=True) # Transform it in a presence/absence matrix (1/0) roary.replace('.{2,100}', 1, regex=True, inplace=True) roary.replace(np.nan, 0, regex=True, inplace=True)