Useful subsets of PDB files
One or more chains have unobserved residues *in the middle* and not just at the terminals
With modified residues *other* than MSE / Seleno-Methionine / SeMet
Other usful files
All FASTA sequences in the PDB
Clustering of 100% sequence identical chains in the PDB using blastclust with the flag -L 1 (coverage treshold of 100%). http://resources.rcsb.org/sequence/clusters/bc-100.out uses a coverage treshold of 0.9 and thus sequences in this file are *not* 100% identical. See http://pdb.rcsb.org/pdb/statistics/clusterStatistics.do and http://www.ncbi.nlm.nih.gov/Web/Newsltr/Spring04/blastlab.html for a further explanation.
The FASTA sequences used do not take into account modified residues. Instead the parent/standard residue of each modified residue is given in the FASTA sequence and the residue modification is not considered a mutation.
bc-100.out