Skip to content

Projects table

When executing a multi-projects run you need to configure the genetic input data in the main configuration file, the other project-specific parameters will be taken from the projects table.

The parameter projects_table is used to activate a multi-projects run by configuring a projects table. The projects table is a tab-separated file with header and the following columns (names are mandatory). You can use NA in the optional columns when the value is not relevant.

  • project_id: a unique identifier for the project
  • pheno_file: path to the phenotype file
  • pheno_cols: comma-separated list of phenotype columns
  • pheno_binary: True if the phenotype is binary, False otherwise
  • pheno_model: association model for this phenotype, can be additive, dominant or recessive
  • cov_file: path to the covariates file, use NO_COV_FILE if no covariates are used
  • cov_cols: comma-separated list of covariates columns (excluding categorical covariates), can be omitted if cov_file is NO_COV_FILE
  • cov_cat_cols (optional): comma-separated list of categorical covariates columns
  • interaction_snp (optional): a single variant ID to perform interaction analysis (GxG)
  • interaction_cov (optional): a single covariate name to perform interaction analysis (GxE)
  • condition_list (optional): a text file containing a list of variants IDs to condition on
  • extract_snps_list (optional): a text file containing a list of variants IDs to restrict GWAS analysis
  • extract_genes_list (optional): a text file containing a list of gene IDs to restrict rare variant analysis

Important note when using conditional or interaction SNP analysis

When you specify condition_list or interaction_snp for one of your project, you must also configure the additional genotype datasets for the conditional analysis. This dataset should contain all the SNPs listed in the condition_list and interaction_snp and it's used to ensure that genetic information for these SNPs are available to all chunks. The same additional dataset is used across all projects at the moment.

You have to set the following parameters:

  • additional_geno_file: prefix of the genotype dataset containing vars in condition_list or interaction snp. This is mandatory for conditional or interaction analysis
  • additional_geno_format: can be bgen, pgen or bed.

Examples

A minimal projects table can be as follows:

project_id pheno_file pheno_cols pheno_binary pheno_model cov_file cov_cols
project1 phenos.txt qpheno1,qpheno2 False additive covars.txt cov1,cov2
project2 phenos.txt bpheno1,bpheno2 True additive NO_COV_FILE NA

An example project table with all columns can be as follows:

project_id pheno_file pheno_cols pheno_binary pheno_model cov_file cov_cols cov_cat_cols interaction_snp interaction_cov condition_list extract_snps_list
project1 phenos.txt qpheno1,qpheno2 False additive covars.txt cov1,cov2 cat_covar1 NA NA NA
project2 phenos.txt qpheno1,qpheno2 False additive covars.txt cov1,cov2 cat_covar1 rs12345 NA conditional_snps.txt
project3 phenos.txt bpheno3,bpheno4 True additive NO_COV_FILE NA NA NA NA NA NA