Main parameters
A template for preparing a configuration file for a new project is provided in templates/single_project.conf. You can copy this file and edit it to adjust the parameters for your analysis.
Here is a list of the main parameters you need to adjust for a new analysis:
-
Set
projectto a custom string defining your project ID. Note that max 50 chars are allowed and no special characters. This will be used in reports and a folder will be created with the same name to store all results. -
Set
chromosomesto represent the list of chromosome to be included in the analysis. You can use a comma-separated list of chromosome numbers, a range like1-22or a mix like1,4,11-18. -
You can eventually restrict the analysis to a specific genomic range, or a specific list of SNP IDs or gene IDs using the
regenie_range,regenie_extract_snpsandregenie_extract_genesoptions. This requiresstep2_gwas_splitandstep2_rarevar_splitto be set tofalse. Keep in mind that in this case you must ensure that the provided SNPs/genes/region are present in the genotype dataset you provided as input and in the specified chromosomes. Otherwise, the pipeline will fail due to SNPs remaining for step2 analysis. -
Set
genotypes_buildto the build of your genotype data, either hg19 or hg38. -
Set
step2_gwas_chunk_sizeandstep2_rarevar_chunk_sizeaccording to the size of your dataset. These control how the dataset is split for step2 analysis. Teh default values usually work fine, but you can increase/decrease them if you have very large or small datasets. Keep in mind that a small chunk size may result in a very large amount of parallel jobs. By default the pipeline job submission rate is limited to 200 concurrent jobs. The total number of jobs will be N_SNPs/gwas_chunk_size for GWAS and N_genes/rare_chunk_size for rare variant analysis. -
For GWAS analysis, set
regenie_gwas_testto the type of model you want to test, either 'additive', 'dominant' or 'recessive'. -
For rare variant test, set
rarevars_vc_testas a comma-separated list of rare variants tests to perform among those accepted by regenie: skat, skato, skato-acat, acatv, acato, acato-full. You can also setrarevars_joint_testto perform a joint test across burden masks / bins. Possible values are: minp, acat, sbat. -
Set
regenie_gwas_min_macandregenie_rarevar_min_macto control the min allowed MAC for variants tested in either GWAS or rare variant analysis. Variants with MAC below this threshold will be excluded from the analysis. -
Set
annotation_min_log10pto the min value ( -log10(pval) ) for top hit SNPs from GWAS results. These SNPs are also annotated in the manhattan plot in the HTML report. -
Set
rarevar_min_log10pto the min value ( -log10(pval) ) for top hit genes from rare variants analysis. These genes are also annotated in the general manhattan plot in the HTML report. -
Set
clump_p1to the maximum pvalue allowed for index SNPs during plink clumping to define top loci -
If you are analyzing many phenotypes on a large dataset, we suggest to set
make_reportto false or at least disable the generation of the locus zoom plots by settingn_top_loci_plotto zero. Generating the HTML graphical report and especially the regional plots, can add a considerable amount of time when many phenotypes are tested together on a large dataset. -
If you have categorical covariates, the maximum number of allowed categories is set to 10 bu default. You can adjsut this using the
maxCatLevelsparameter.
Input files
Set the input files parameters as described in the input files sections.
- The full genetic dataset to perform GWAS analysis
- The reduced SNPs dataset to perform REGENIE step1 regression
- The rare variants dataset and annotations to perform rare variant tests
- An optional set of files representing your LD panel to perform loci clumping
Multi-models run
In case you want to configure a you also need to set the following parameters:
-
Set
models_tableto a tab-separated file defining the models to test -
Set
missing_toleranceto the maximum allowed fraction of missing phenotype values when collecting uniform group of phenotypes for a run.
Multi-projects run
In case you want to configure a multi-projects run you also need to set the following parameters:
- Set
projects_tableto a tab-separated file defining the projects to test
Conditional and interaction analysis
In case you want to perform conditional or interaction analysis, you can configure the following parameters:
interaction_cov: run GxE test specifying the interacting covariate from covariate tableinteraction_snp: run GxG test specifying the interacting variant IDcondition_list: run conditional analysis specifying a file with variant IDs to condition on
Note that to perform conditional/interaction analysis, an additional genotype dataset must be provided containing the SNP(s) used for conditioning/interaction. This can be configured using the following parameters:
additional_geno_file: prefix of the genotype dataset containing vars in condition_list or interaction var. This is mandatory for conditional or interaction analysisadditional_geno_format: can be bgen, pgen or bed.
See the conditional analysis section for more details.