2022/03/02

Plink memo

<multiple phenotype>

plink --file mydata --pheno pheno2.txt --pheno-name bmi --assoc

will select the second phenotype labelled "bmi", for analysis

Finally, if there is more than one phenotype, then for basic association tests, it is possible to specify that all phenotypes be tested, sequentially, with the output sent to different files: e.g. if bigpheno.raw contains 10,000 phenotypes, then

plink --bfile mydata --assoc --pheno bigpheno.raw --all-pheno

will loop over all of these, one at a time testing for association with SNP, generating a lot of output. You might want to use the --pfilter command in this case, to only report results with a p-value less than a certain value, e.g. --pfilter 1e-3.


The --merge option can also be used with binary PED files, either as input or output, but not as the second file: i.e.

plink --bfile data1 --merge data2.ped data2.map --make-bed --out merge


For example, consider we had 4 PED/MAP filesets (labelled fA.* through fD.*) and 4 binary filesets, labelled fE.* through fH.*). Then using the command

plink --file fA --merge-list allfiles.txt --make-bed --out mynewdata

would create the binary fileset

     mynewdata.bed

     mynewdata.bim

     mynewdata.fam


To analyse only a specific chromosome use

plink --file data --chr 6


Based on a range of SNPs (--from and --to)

To select a specific range of markers (that must all fall on the same chromosome) use, for example:

plink --bfile mydata --from rs273744 --to rs89883


To extract only a subset of SNPs, it is possible to specify a list of required SNPs and make a new file, or perform an analysis on this subset, by using the command

plink --file data --extract mysnps.txt


<range file>

Alternatively, you can use the command --range to modify the behavior of --extract and --exclude. If the --range flag is added, then instead of a list of SNPs, PLINK will expect a list of chromosomal ranges to be given instead, one per line.

plink --file data --extract myrange.txt --range

All SNPs within that range will then be excluded or extracted. The format of myrange.txt should be, one range per line, whitespace-separated:

     CHR     Chromosome code (1-22, X, Y, XY, MT, 0)

     BP1     Start of range, physical position in base units

     BP2     End of range, as above

     LABEL   Name of range/gene

For example,

     2 30000000 35000000  R1

     2 60000000 62000000  R2

     X 10000000 20000000  R3


VIF variance inflation factor

A VIF of 10 is often taken to represent near collinearity problems in standard multiple regression analyses

A VIF of 1 would imply that the SNP is completely independent of all other SNPs. Practically, values between 1.5 and 2 should probably be used