Projects table
When executing a multi-projects run you need to configure the genetic input data in the main configuration file, the other project-specific parameters will be taken from the projects table.
The parameter projects_table is used to activate a multi-projects run by configuring a projects table. The projects table is a tab-separated file with header and the following columns (names are mandatory). You can use NA in the optional columns when the value is not relevant.
project_id: a unique identifier for the projectpheno_file: path to the phenotype filepheno_cols: comma-separated list of phenotype columnspheno_binary: True if the phenotype is binary, False otherwisepheno_model: association model for this phenotype, can be additive, dominant or recessivecov_file: path to the covariates file, use NO_COV_FILE if no covariates are usedcov_cols: comma-separated list of covariates columns (excluding categorical covariates), can be omitted if cov_file is NO_COV_FILEcov_cat_cols(optional): comma-separated list of categorical covariates columnsinteraction_snp(optional): a single variant ID to perform interaction analysis (GxG)interaction_cov(optional): a single covariate name to perform interaction analysis (GxE)condition_list(optional): a text file containing a list of variants IDs to condition onextract_snps_list(optional): a text file containing a list of variants IDs to restrict GWAS analysisextract_genes_list(optional): a text file containing a list of gene IDs to restrict rare variant analysis
Important note when using conditional or interaction SNP analysis
When you specify condition_list or interaction_snp for one of your project, you must also configure the additional genotype datasets for the conditional analysis. This dataset should contain all the SNPs listed in the condition_list and interaction_snp and it's used to ensure that genetic information for these SNPs are available to all chunks. The same additional dataset is used across all projects at the moment.
You have to set the following parameters:
- additional_geno_file: prefix of the genotype dataset containing vars in condition_list or interaction snp. This is mandatory for conditional or interaction analysis
- additional_geno_format: can be bgen, pgen or bed.
Examples
A minimal projects table can be as follows:
| project_id | pheno_file | pheno_cols | pheno_binary | pheno_model | cov_file | cov_cols |
|---|---|---|---|---|---|---|
| project1 | phenos.txt | qpheno1,qpheno2 | False | additive | covars.txt | cov1,cov2 |
| project2 | phenos.txt | bpheno1,bpheno2 | True | additive | NO_COV_FILE | NA |
An example project table with all columns can be as follows:
| project_id | pheno_file | pheno_cols | pheno_binary | pheno_model | cov_file | cov_cols | cov_cat_cols | interaction_snp | interaction_cov | condition_list | extract_snps_list |
|---|---|---|---|---|---|---|---|---|---|---|---|
| project1 | phenos.txt | qpheno1,qpheno2 | False | additive | covars.txt | cov1,cov2 | cat_covar1 | NA | NA | NA | |
| project2 | phenos.txt | qpheno1,qpheno2 | False | additive | covars.txt | cov1,cov2 | cat_covar1 | rs12345 | NA | conditional_snps.txt | |
| project3 | phenos.txt | bpheno3,bpheno4 | True | additive | NO_COV_FILE | NA | NA | NA | NA | NA | NA |