http://ibgwww.colorado.edu/cdrom2009/ScriptsA/purcell/GWAS/instruct.pdf
NGS --> vcf file
row: SNPs col: ID, REF, INFO, SAMPLE1, SAMPLE2, SAMPLE3, ... SAMPLE k
--> ped file and map file
ped file row: SAMPLE col: FAMILYID, SAMPLEID, PHENOTYPE, SNP1, SNP2, ... SNP n
map file row: SNPs col: Ch n, location info
--> bed file, bim file and fam file
bed file binary data of genome and SNPs
bim file row: SNPs
fam file row: SAMPLE
--file deals with ped and map file
--bfile deals with bed, bim and fam file
File format: https://www.cog-genomics.org/plink/1.9/formats
validation of files
plink --bfile <file> --out validate
calc MAF
plink --bfile <file> --freq --out freq1
extract only MAF > 0.005
awk ‘ $5 >= 0.05 { print $2 } ‘ freq1.frq > mylist.snps
or
plink --bfile <file> --chr XX --maf 0.05 --write-snplist --out my-list (same output)
Hardy-Weinberg equilibrium
plink --bfile <file> --hardy --out hwe1
All together
plink --bfile <file> --maf 0.01 --geno 0.05 --mind 0.05 --hwe 1e-3 --make-bed --out wgas3
.irem file includes excluded case