In this blog I am going to describe the steps you can take to install R, genoPlotR, a few plugins in a linux machine (I have Ubuntu 16 which I upgraded in my biolinux 8 operating system) and other software and make a publication quality ACT-like image of a comparison of genomic DNA sequences or genomic regions.
1) Installing or upgrading R
My machine had R-3.2.3. This may be good enough but if you’re going to start installing things and creating files you want your work to give you “rewards” in the form of a platform you can use for as long as possible before you start finding that the new packages you want to use are incompatible with your old R version and you start having to search in archives for compatible packages and help. There are various help pages on the web on how to migrate your work if you do need to move from an older version of R to a newer one, I’m not going to go into that here but you can find most of the information you need here:
The guide directs you to https://CRAN.R-project.org/mirrors.html to obtain R. This link gives you a list of mirrors from which you have to chose one. In my case I chose the first UK link what is for the University of Bristol. If you select that mirror you get some information on R such as the latest release (Monday 2016-10-31, Sincere Pumpkin Patch) R-3.3.2.tar.gz) and options to download R for Linux, Mac, or Windows. Once you click on the option to download you then need to select your linux version from the next page (I picked ubuntu).
The lazy way in which you can then install R is using typing the following text into a terminal:
sudo apt-get install r-base r-base-dev
In my case I got the following messages:
r-base is already the newest version (3.2.3-4).
r-base-dev is already the newest version (3.2.3-4).
Be aware that because of how rapidly things change and how databases and links are not always updated or maintained you have to expect that you are being lied to and double-check everything.
If you have an eye for detail and followed the guide above you’ll be blinking at the 3.2.3 and wondering how that is the new version when you’ve just been told the new version is 3.3.2.
Unfortunately that’s life. Ignore the message and download the newest version of R . There’s a link to “R Sources” in the left sidebar under Software which gives you a link to R-3.3.2.tar.gz.
Download the file to a folder of your choice and unpack it with
tar -xf R-3.3.2.tar.gz
The R manual gives a list of ways to install the new R version on your system.
I just used
You should get a list of messages on the lines of:
R is now configured for x86_64-pc-linux-gnu
Source directory: .
Installation directory: /usr/local
Congratulations, you now have a up-to-date version or R.
In my case I already had R-3.2.3 and because its not very big, just in case I happen to need to use a package which is incompatible with newer versions I decided to keep it. In order to have the newer version open when I type R in my command line I renamed R in the folder /usr/bin to R-3.2.3 and copied my new R file to /usr/local/bin
R 3.3.2 now opens by simply typing R in a terminal from any folder location.
2) Installing some R packages
To use R as a tool to aid in the analysis of biological data you probably don’t want to have to write your own scripts. The next thing to do if this is the case is install packages which will do the jobs you need to do for you.
A nice and easy to use package for creating genomic or cluster comparison files is genoPlotR, made and maintained by Lionel Guy and I’m going to continue talking about the 0.8.4 version (2015-07-02).
I chose this for several reasons. One is that it includes very convenient methods for importing your data from genbank files, blast files and even alignment files. This issue is not trivial because if your data is in various formats of gff, gff3, gtf annotation files and the tools you want to use tell you that you have to supply files with three columns with specific information then you either have to rewrite the information you need based on the annotation files you have or spend a few days tidying them up with various scripts. Also in my case I am a fan of ACT (Sanger) and find the similarity in its images and the convenience of using the same files to manipulate and annotate your files in ACT and Artemis then feed the same files into genoPlotR very useful.
To install packages start up R in your working folder by typing:
From there if you have a newly installed R equip yourself with Bioconductor:
If you’re new to R check out http://bioconductor.org/packages/release/BiocViews.html#___Software to get an idea as to what other packages you can use from bioconductor.
Next install the genoPlotR, its dependencies and any of the other packages you may need.
You can exit R by typing q()
3) Using R from command line
Now you have everything installed you need to start creating your R analysis pipeline.
You can just write commands in your terminal but I’m going to recommend creating a file in “getedit” or “notepad++” with your pipeline. I’m going to call mine “R_cluster_graphics.analysis”
You can add comments of what you are doing by beginning lines with #.
At any point you can run the script to see what comes out with the command:
R CMD BATCH R_cluster_graphics.analysis
This will create file named R_cluster_graphics.analysis.Rout in the folder you ran it from which you can open and check for error messages and for messages which answer questions in your analysis pipeline for example is.dna_seg which returns TRUE or FALSE.
First add commands to load the packages you need to use:
## State where you would like your images to be save to:
imgPath <- “../img”
pdfPath <- “../pdfs”
imgPath <- “/media/…/genoPlotR”
pdfPath <- “/media/…/genoPlotR”
##Import your genbank data (referred to as dna_seg):
#check dna_seq identity
#Create your Comparison files.
You can make these with blastall or blast+ in a terminal by formatting one sequence as a database and running a blastn or tblastx using the first dna_seg as input and the second one as the database. Here is an example of how to do this for two fasta files that match your genbank files and are in the same folder:
1) Navigate to your folder.
2) Format your database with:
formatdb -i SampleB.fasta -p F
3) Run a blastall search:
blastall -p tblastx -d /media/…/ SampleB.fasta -i /media/…/ SampleA.fasta -o SampleA_SampleB.tblastx -m 9
Alternatively you can also use the NCBI online blast tool at https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome and download your comparison files from there.
#Import your Comparison files.
SampleA_SampleB.comparison <- read_comparison_from_blast(“/media/…/SampleA_SampleB.tblastx”, sort_by = “per_id”,filt_high_evalue = NULL,filt_low_per_id =NULL,filt_length = NULL,color_scheme = NULL)
#check that your comparison file is acceptable:
If you would like to have a distance tree in your image make an alignment and a corresponding dendrogram. There are a number of command line tools and GUI software packages which can do this.
In my case I used MEGA7 to make an alignment with the muscle algorithm, you can try other algorithms and especially if your sequences are large or dissimilar I would recommend trying a few algorithms and picking the most appropriate alignment. You can get various version of MEGA from http://megasoftware.net/ I used the command line version of MEGA7 by opening megaproto (go to the installation folder and type megaproto), selected “non coding nucleotide alignment” and “muscle nuc alignment” from the Align option in the toolbar and changed the cluster method to Neighbor Joining for all iterations then saved these settings as a .mao file.
To make life easier I joined all my fasta sequences with the command
cat sequence1.fasta sequence2.fasta etc > allsequences.fasta
I then ran the alignment with the command:
/media/…/MEGA7/megacc -a /media/…/nucleotide_align.mao -d /media/…/allsequences.fasta -o MEGA7_allsequences.alignment
I then used the alignment to make a tree with RAxML version 8.2.4. which you can get from GitHub at https://github.com/stamatak/standard-RAxML
If you would are also going to use use RAxML check which version you need. If you are able to use multiple threads I would definitely pick the Pthreads option. Especially if you are planning on making trees out of huge alignments in the future I would definitely pick the best version you can. In my case I used the AVX version which is compatible with Intel i7.
You can download the files with the command
git clone https://github.com/stamatak/standard-RAxML.git
and install the appropriate version in the manner described in the Readme file in the software folder. In my case it was:
make -f Makefile.AVX.PTHREADS.gcc
Run RAxML to make your tree. I used the BS and ML recommended settings from the RAxML manual. It didn’t like my MEGA7 alignment so I converted it to a phylip alignment. There are many tools that can handle this but since I don’t have much on my new computer yet I used this link http://sequenceconversion.bugaco.com/converter/biology/sequences/
RAxMLHPC-PTHREADS-AVX -f a -x 12345 -p 12345 -# 100 -m GTRGAMMA -s /media/…/MEGA7_alignment.phylip -n MEGA7_alignment.RAxML -T 6
Note: If you can’t use threads remove the -T 6 bit
This creates a RAxML_bestTree.MEGA7_alignment.RAxML file
You can view your tree using various software packages. One example is dendroscope which comes with biolinux and by typing dendroscope into a terminal it opens a graphical user interphase (GUI) from which you can navigate to your tree file and quickly visualize it.
Give R your tree information by pasting the data from the RAxML_bestTree.MEGA7_alignment.RAxML file into a tree<- newick2phylog(“”) command :
#Pick the shapes you would like your CDS to appear as. I used arrows. You can see the other options with the command: gene_types()
gene_types(auto = FALSE)
SampleA$gene_type <- “arrows”
SampleB$gene_type <- “arrows”
#Give you dna_seg sequences CDS annotation
# Make your DNA-comparison image
pdf(file.path(pdfPath, “My_SampleABetc_comparison.pdf”), h=7, w=10)
plot_gene_map(dna_segs=list(SampleA, SampleB, next samples), comparisons=list(SampleA_SampleB.comparison, next comparisons),annotations=list(annotA,annotB,next annotations),override_color_schemes=TRUE, global_color_scheme=c(“e_value”, “auto”, “red_blue”, “0.5”),dna_seg_labels=c(“SampleA”,”SampleB”,”next labels”),tree=tree)
This should give you a nice comparison pdf which looks like what you always wanted ACT images to look like, in the folder you specified for pdfPath above.
The manual gives lots of detailed information on how you can change the parameters above to alter your image size, colours, annotation etc but I would like to mention that the power of using a file with the commands above to make your image is that once you have it and see what the output looks like you can go back to your genbank files and add any new CDS you discover or change their names then run the script again with “R CMD BATCH R_cluster_graphics.analysis” to get the new image. You can also make several different trees, call them tree1, tree2, tree3 then make several comparison images just by rerunning the script once with more copies of the “# Make your DNA-comparison image” commands with the appropriate tree name specified and the other lists and labels in the correct order. Here is an example:
## Make your comparison tree1
pdf(file.path(pdfPath, “My_SampleABetc_tree1.pdf”), h=7, w=10)
plot_gene_map(dna_segs=list(SampleA, SampleB, next samples), comparisons=list(SampleA_SampleB.comparison, next comparisons),annotations=list(annotA,annotB,next annotations),override_color_schemes=TRUE, global_color_scheme=c(“e_value”, “auto”, “red_blue”, “0.5”),dna_seg_labels=c(“SampleA”,”SampleB”,”next labels”),tree=tree1)
## Make comparison image for tree2 (you may need to make and specify more blastall comparison files)
pdf(file.path(pdfPath, “My_SampleABetc_tree2.pdf”), h=7, w=10)
plot_gene_map(dna_segs=list(SampleB, SampleA, next samples), comparisons=list(SampleB_SampleA.comparison, next comparisons),annotations=list(annotB,annotA,next annotations),override_color_schemes=TRUE, global_color_scheme=c(“e_value”, “auto”, “red_blue”, “0.5”),dna_seg_labels=c(“SampleB”,”SampleA”,”next labels”),tree=tree2)
#You can also add different types of annotation and gene_types by giving them different names then making even more images each with a different type of annotation or graphics for your CDS.
4) Check your images
Hopefully if you followed this guide you should have some very pretty pdf files of your genomic DNA comparisons now. It is a good idea to scrutinize your images and make sure they look realistic. If it looks like the comparison lines don’t seem quite right between two or more samples, for example you have 3 conserved geneA sets but the comparison lines give good identity to irrelevant regions, check that you used the correct reference and input sequences in your blast and try supplying a comparison file with the sequences give in the reverse order. Unfortunately genoPlotR does not automatically get the sample information from the blast files so this is something you have to be careful about.
5) Final words
I would like to give thanks to Lionel Guy himself in this post as besides creating this very useful and easy-to-use R tool (compared to many other R tools) in two instances where I could not tell from the manual how to deal with my annotations and tree, he responded in less than 24 hours to my e-mailed questions with precise and very helpful answers. This blog is by no means a substitute for the software manuals or a claim of expertise but rather what I hope is a simple guide for biologists which are unfamiliar with software manuals, R code and the sort of mistakes they can make if you are not aware of issues such as order of commands and recognition of data structures and need a simple guide on how to go about starting to use R to get a publication-quality image that looks like your lovely ACT comparison.
- ACT:Carver, Tim J., et al. “ACT: the Artemis comparison tool.” Bioinformatics 21.16 (2005): 3422-3423 http://www.sanger.ac.uk/science/tools/artemis-comparison-tool-act
- Artemis: Rutherford, Kim, et al. “Artemis: sequence visualization and annotation.” Bioinformatics 16.10 (2000): 944-945http://www.sanger.ac.uk/science/tools/artemis
- genoPlotR: Guy, Lionel, Jens Roat Kultima, and Siv GE Andersson. “genoPlotR: comparative gene and genome visualization in R.” Bioinformatics 26.18 (2010): 2334-2335 https://cran.r-project.org/web/packages/genoPlotR/index.html
- BLAST: http://www.ncbi.nlm.nih.gov/blast
- Bioconductor: http://bioconductor.org/
- MEGA7: http://megasoftware.net/
- RAxML: A. Stamatakis: “RAxML Version 8: A tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies”. In Bioinformatics, 2014, open access.
- GitHub: https://github.com/
- Biolinux: http://environmentalomics.org/bio-linux/
- Hatziioanou D., Gherghisan-Filip C., Saalbach G., Horn N., Wegmann U., Duncan S., Flint H, Mayer M.J., Narbad A. Discovery of a novel lantibiotic nisin O from Blautia obeum A2-162, isolated from the human gastrointestinal tract, Microbiology (2017) 163: 1292-1305 (genoPlotR image made using this process).
New CRISPR-Cas9 screening method identifies “magic bullet” molecular targeted therapy candidates to help the body’s immune system fight cancer
Molecular targeted therapy (composed of small-molecule drugs and monoclonal antibodies) blocks the growth and spread of cancer by interfering with specific molecules that are critical for tumor progression such as checkpoint inhibitors1.
The potential of this approach was recognized in 1998, when the antibody trastuzumab (Herceptin) against receptor tyrosine kinase HER2 was approved by the FDA for treatment of patients with HER2-receptor positive metastatic breast cancer2. Subsequently in 2001, imatinib (Gleevec), a small molecule kinase inhibitor3 received FDA approval for the treatment of chronic myeloid leukemia (CML). Imatinib was the first rationally designed small-molecule inhibitor, it made the cover of “TIME” magazine as the “magic bullet” and was considered as the start of a new era in anticancer drug discovery4. In recent years our advances in tumor immunology and immunotherapy have led to FDA and EMA approval of several checkpoint inhibitors such as PD-1 and PD-L1 in clinical practice5.
Dr W. Nick Haining, an associate professor of paediatrics at Harvard Medical School, associate member of the Broad Institute of MIT and Harvard and team leader of the Dana-Farber/Boston Children’s team, in a recent press release voiced his concerns that despite the clinical success of PD-1 checkpoint inhibitors the majority of patients’ cancer does not respond to this treatment. This has triggered a number of studies looking at drugs to increase the number of patients responding to PD-1 inhibitor treatment.
In their recent publication6 Dr Hainings’ team showed that CRISPR-Cas9 targeted genome editing technology which up to now has been used to identifying genes required by tumor cells for proliferation, metastasis and drug resistance7,8, could be used to look at the interaction of tumor cells with the natural immune system to identify candidate drug targets for sensitizing cancerous cells to the immune system. They did this by disabling 2368 genes in a library of melanoma cells and transplanting the cells into Tcra−/− mice (which lack CD4+ and CD8+ T cells) and wild type mice given the GVAX vaccine9 to stimulate their immune response or GVAX and anti-PD-1 checkpoint inhibitor.
By inspecting the list of disabled genes that were depleted from immunotherapy treated tumor s in their in vivo screening model they found the known immune evasion molecules PD-L110 and CD4711 and confirmed their susceptibility to GVAX and PD-1 blockade thus validating that their genetic screening method is capable of finding genes which give cancer cells immune evasion properties.
The Manguso et al team went on to examine the biological pathways associated with other genes deletions and based on those concentrated on Ptpn2. They showed that Ptpn2 increases the resistance of cancer cells to immunotherapy, its loss increases antigen presentation and susceptibility to cytotoxic CD8+ T cells while also increasing IFNγ sensing by tumor cells and that the immune susceptibility is dependent on IFNγ. The Haining team is now thinking about small molecule Ptpn2 inhibitor drugs to make use of their candidate magic bullet.
- Huang, Min, et al. “Molecularly targeted cancer therapy: some lessons from the past decade.” Trends in pharmacological sciences 35.1 (2014): 41-50.
- Hudziak, Robert M., et al. “p185HER2 monoclonal antibody has antiproliferative effects in vitro and sensitizes human breast tumor cells to tumor necrosis factor.” Molecular and cellular biology 9.3 (1989): 1165-1172.
- Druker, Brian J., et al. “Efficacy and safety of a specific inhibitor of the BCR-ABL tyrosine kinase in chronic myeloid leukemia.” N Engl j Med 2001.344 (2001): 1031-1037.
- Danchev, N., I. Nikolova, and G. Momekov. “A New Era in Anticancer Therapy/Imatinib—A New Era in Anticancer Therapy.” (2008): 769-770.
- Rossi, Sabrina, et al. “Clinical characteristics of patient selection and imaging predictors of outcome in solid tumors treated with checkpoint-inhibitors.” European Journal of Nuclear Medicine and Molecular Imaging (2017): 1-16.
- Manguso, Robert T., et al. “In vivo CRISPR screening identifies Ptpn2 as a cancer immunotherapy target.” Nature 547.7664 (2017): 413-418.
- Hart, Traver, et al. “High-resolution CRISPR screens reveal fitness genes and genotype-specific cancer liabilities.” Cell 163.6 (2015): 1515-1526.
- Chen, Sidi, et al. “Genome-wide CRISPR screen in a mouse model of tumor growth and metastasis.” Cell 160.6 (2015): 1246-1260.
- Dranoff, Glenn. “GM-CSF-secreting melanoma vaccines.” Oncogene 22.20 (2003): 3188-3192.
- Duraiswamy, Jaikumar, et al. “Dual blockade of PD-1 and CTLA-4 combined with tumor vaccine effectively restores T-cell rejection function in tumors.” Cancer research 73.12 (2013): 3591-3603.
- Tseng, Diane, et al. “Anti-CD47 antibody–mediated phagocytosis of cancer by macrophages primes an effective antitumor T-cell response.” Proceedings of the National Academy of Sciences 110.27 (2013): 11103-11108.
Difficult long Greek surnames.
I still remember the look of utter incomprehension and disbelief when, after being asked my surname in the UK I used to just tell people.
Me> My surname is Hatziioanou.
This was usually followed by a blank stare or silence over the phone (depending on the circumstance) during which I could actually picture strong “brain machines” grinding to an abrupt halt for a few seconds and then… ever so slowly and irregularly… grinding to a forced start as they battled to retain the echoes of this cryptic word,… and process it in order to find some familiar sounds which they can manage to repeat or scribble down.
This was always said with either annoyance, certainty or fear.
Person > Can you say that again please?
Me > Sure, it’s Hhhaaaaattzziiiiiiiioooaaannnnnouuuu
I would repeat slowly to make it easier for them.
After a few weeks in the UK I realized it was completely futile to tell people my surname, it was simply impossible for anyone to make head or tail from it. It was only when I had to face Chinese names while living in Taiwan years later that I realized what my surname must sound like to foreigners. For me it is a normal everyday easy to remember Greek surname which has a common structure composing of the words Hatzi and the name of one of my ancestors, Ioannis (or John), with the name’s ending changed to Ioannou to give it the meaning “of Ioannis”. I miss one of the “n”s out in English to make it slightly easier for people. That’s as easy as a name can get right? In Greece we often don’t even write the full name but use a shortened version such as H”Ioannou instead because “Hatzi+name” surnames are so common everyone immediately understands the name and how it is written in full.
I quickly learned to stop telling people my surname if it was to be recorded. Instead, when asked I would answer:
Me > Sure, would you like me to spell it out for you?
Person > *Suspicious confusion* um ok…
Me > That’s H, A, T, Z, double I, O, A, N, O, U. Would you like me to repeat that for you?
Person > *Look of disbelief* No, that’s alright……..
Person > …..And how do you pronounce that?
Me > Hatziioanou *smile*, thats Hhhaaaaattzziiiiiiiioooaaannnnnouuuu.
Person > Hjnoyidj?… znoyaaaanov?? right! *look of slightly guilty uncertainty and incomprehension*
It might not be the name they expected or the way they preferred to have surnames delivered but I quickly came to find my surname was wonderfully useful for starting a conversation or breaking the ice with people who had written surnames down as a matter of routine and had hardly looked at me before! Frequently I would be asked:
Person > So what does your surname mean?
Just so you know that I am not making up this story about my surname I would like to direct you to the following text copied from the link: http://genealogy.familyeducation.com/surname-origin/hatzis
“ Last name origin & meaning of Hatzis
Greek: from the vocabulary word khatzis ‘pilgrim (to Jerusalem)’, from the Arabic hajji ‘pilgrim (to Mecca)’, borne originally by Greek Muslims. Having completed a pilgrimage to the Holy Land was a mark of high social distinction. Often, this surname is a reduced form of a surname with Hatzi- as a prefix to a patronymic, naming the ancestor who performed the pilgrimage; e.g. Hatzimarkou ‘son of Mark the Pilgrim’, Hatzioannou ‘son of John the Pilgrim’.”
So there you are! I had an ancestor called John who went on a pilgrimage to Jerusalem and since then his descendents have had the surname: of John the pilgrim!
As with any translation of a name you will find it spelled in many different ways in English, the main one being Chatziioannou (which sounds completely different from the name in Greek) but they are all the same name, written the same way in Greek which people have one way or another hammered into Latin letters.
My conclusion from the whole story: I’ve come across so many people with difficult surnames that seem embarrassed to say their name and stutter it in low, barely audible voices or repeatedly shout it at you defensively to your surprise and confusion. If you have a difficult surname, make the most out of it and spell it out to people, smile at their efforts and explain it to them. If you’re the shy sort you have a powerful tool to start conversations and get to know people. If you’re faced with an impossible surname, don’t panic, ask for it to be spelled out and test your ability to say it back to them, you can both get a laugh out of it!