Working with R and Bioconductor on the cloud (Amazon EC2)
Some Bioconductor-based projects may be computationally challenging and require a lot of resources. If a powerful workstation is not available, it may be a good idea to work with R and Bioconductor at scale using Amazon Web Services (AWS). Setting…
Read more
Building and Publishing R packages on CRAN or GitHub
Building an R package is an easy and very convenient way keep your work well organized. Moreover, it facilitates sharing your code with the R community. Here we will discuss about publishing R packages on CRAN and GitHub. The post…
Read more
DNA sequence manipulation in R: getting the Reverse Complement of a DNA string
Manipulating DNA/RNA sequences is a very basic and fundamental operation in Molecular Biology. Writing the reverse-complement of a DNA sequence is very easy, but is also a error-prone operation if performed manually. Sequence manipulation tools are available online and free-of-charge…
Read more
easyPubMed for business: scraping PubMed data in R for a targeting campaign
In this post, I will cover how to use easyPubMed (R Package) to retrieve data from PubMed. This example is focused on data extraction from PubMed records for a targeting campaign. The post is aimed at suggesting a business-oriented way…
Read more
Querying PubMed via the easyPubMed package in R
PubMed (NCBI Entrez) is an online database of citations for biomedical literature that is available at the following URL: http://www.ncbi.nlm.nih.gov/pubmed. Retrieving data from PubMed is also possible in an automated way via the NCBI Entrez E-utilities. A description of how…
Read more
Scraping Impact Factor data from the Web using httr and regex in R
A couple of days ago, I found a website listing Impact Factor data of many scientific Journals organized in HTML tables (http://www.citefactor.org). Unfortunately, this website didn’t allow users to download Impact Factor tables in 1-click. Moreover, data were scattered over…
Read more
Retrieving Expression Levels of all Members of a Gene Family (GO) from an Oncomine DataSet
This page describes a quick way to extract gene expression information for a specific group of genes (as defined in one or more GO terms) from a Oncomine DataSet. This is useful, for example, if we want to study a…
Read more
Exploratory analysis of datasets obtained from GEO
Gene Expression Omnibus (GEO) is an online public repository of functional genomics data. Information about GEO may be found at the following URL: http://www.ncbi.nlm.nih.gov/geo/info/faq.html. Briefly, GEO includes different types of datasets: GEO Profiles are curated datasets obtained from GEO DataSets….
Read more
Colorful Hierarchical Clustering Dendrograms with R
Hierarchical clustering is a very effective method for exploratory data analysis and is aimed at building a hierarchy of clusters based on the similarity of the samples in a dataset. The idea behind hierarchical clustering is very intuitive. Let’s assume…
Read more
RPKM calculation and relative gene expression quantification
RNAseq data may provide an estimate of the relative expression level of different genes in a sample or in a cell type. It is sufficient to compare RPKM (reads per kilobase trascript per million reads) values of the genes of…
Read more