2022/02/23

PLINK memo


http://ibgwww.colorado.edu/cdrom2009/ScriptsA/purcell/GWAS/instruct.pdf

 NGS --> vcf file  

     row: SNPs col: ID, REF, INFO, SAMPLE1, SAMPLE2, SAMPLE3, ... SAMPLE k

--> ped file and map file 

     ped file row: SAMPLE  col: FAMILYID, SAMPLEID, PHENOTYPE, SNP1, SNP2, ... SNP n

     map file row: SNPs  col: Ch n, location info

--> bed file, bim file and fam file

     bed file binary data of genome and SNPs 

     bim file row: SNPs  

     fam file row: SAMPLE 

  

--file deals with ped and map file

--bfile deals with bed, bim and fam file 

File format:  https://www.cog-genomics.org/plink/1.9/formats


validation of files

plink --bfile <file>  --out validate

calc MAF

plink --bfile <file> --freq --out freq1

extract only MAF > 0.005 

awk ‘ $5 >= 0.05 { print $2 } ‘ freq1.frq > mylist.snps

or 

plink --bfile <file> --chr XX --maf 0.05 --write-snplist --out my-list  (same output)


Hardy-Weinberg equilibrium

plink --bfile <file> --hardy --out hwe1


All together

plink --bfile <file> --maf 0.01 --geno 0.05 --mind 0.05 --hwe 1e-3 --make-bed --out wgas3

.irem file includes excluded case