Search anything and hit enter
  • Teams
  • Members
  • Projects
  • Events
  • Calls
  • Jobs
  • publications
  • Software
  • Tools
  • Network
  • Equipment

A little guide for advanced search:

  • Tip 1. You can use quotes "" to search for an exact expression.
    Example: "cell division"
  • Tip 2. You can use + symbol to restrict results containing all words.
    Example: +cell +stem
  • Tip 3. You can use + and - symbols to force inclusion or exclusion of specific words.
    Example: +cell -stem
e.g. searching for members in projects tagged cancer
Search for
Count
IN
OUT
Content 1
  • member
  • team
  • department
  • center
  • program_project
  • nrc
  • whocc
  • project
  • software
  • tool
  • patent
  • Administrative Staff
  • Assistant Professor
  • Associate Professor
  • Clinical Research Assistant
  • Full Professor
  • Graduate Student
  • Lab assistant
  • Non-permanent Researcher
  • Permanent Researcher
  • Pharmacist
  • PhD Student
  • Physician
  • Post-doc
  • Project Manager
  • Research Associate
  • Research Engineer
  • Retired scientist
  • Technician
  • Undergraduate Student
  • Veterinary
  • Visiting Scientist
  • Deputy Director of Center
  • Deputy Director of Department
  • Deputy Director of National Reference Center
  • Deputy Head of Facility
  • Director of Center
  • Director of Department
  • Director of Institute
  • Director of National Reference Center
  • Group Leader
  • Head of Facility
  • Head of Structure
  • Honorary President of the Departement
  • Labex Coordinator
  • Operational and administrative manager
Content 2
  • member
  • team
  • department
  • center
  • program_project
  • nrc
  • whocc
  • project
  • software
  • tool
  • patent
  • Administrative Staff
  • Assistant Professor
  • Associate Professor
  • Clinical Research Assistant
  • Full Professor
  • Graduate Student
  • Lab assistant
  • Non-permanent Researcher
  • Permanent Researcher
  • Pharmacist
  • PhD Student
  • Physician
  • Post-doc
  • Project Manager
  • Research Associate
  • Research Engineer
  • Retired scientist
  • Technician
  • Undergraduate Student
  • Veterinary
  • Visiting Scientist
  • Deputy Director of Center
  • Deputy Director of Department
  • Deputy Director of National Reference Center
  • Deputy Head of Facility
  • Director of Center
  • Director of Department
  • Director of Institute
  • Director of National Reference Center
  • Group Leader
  • Head of Facility
  • Head of Structure
  • Honorary President of the Departement
  • Labex Coordinator
  • Operational and administrative manager
Search
Go back
Scroll to top
Share
© Research
Member: Post-doc - Alumni

Sohta Ishikawa

Scientific Fields
Diseases
Organisms
Applications
Technique

About

Research Themes

Methodological studies for phylogenetic artifacts caused by compositional biases of sequences

Performance assessment of RY-coding and non-homogeneous models in phylogenetic inferences from nucleotide sequences with significant compositional heterogeneity
In phylogenetic analyses of nucleotide sequences, ‘homogeneous’ substitution models, which assume the stationarity of base composition across lineages, are widely used. However, a homogeneous model-based analysis can yield an artifactual tree when our data exhibit heterogeneous base compositions among sequences. Potential artifacts stemming from compositional heterogeneity in tree reconstruction can be countered by two approaches, ‘RY-coding’ and ‘non-homogeneous (NH)’ models. The former approach converts four bases into two-state characters, purine (R) and pyrimidine (Y), to homogenize their compositions among sequences (Phillips and Penny, 2003). In contrast, compositional heterogeneity is explicitly incorporated in the latter approach by allocating free model parameters in a branch-by-branch fashion (Galtier and Gouy, 1998; Dutheil and Boussau, 2008). Although these approaches have been applied to several real-world data analyses, their basic properties have not been fully examined by pioneering simulation studies.

In this study, I demonstrated the de facto first simulation to assess the performance of the maximum-likelihood phylogenetic analyses incorporating RY-coding and NH models under the presence of compositional heterogeneity. These two methods were applied to the analyses of the ‘4-taxon’ datasets bearing various degrees of the heterogeneity of adenine and thymine (AT) content. Both RY-coding and NH model-based analyses showed superior performance to reconstruct the true phylogenetic relationships against ~20% AT content difference among sequences, compared to a homogeneous model-based analysis. Nevertheless, I revealed that the accuracy of phylogenetic inference based on RY-coding, at least to some extent, depends on the substitution process that generated the sequence data of interest (e.g, transition/transversion ratio). Furthermore, the inferences from RY-coding-based analyses can be severely biased when the data-recoding cannot ameliorate complex patterns of compositional heterogeneity in the data. On the other hand, NH models appeared to be robust against all types of compositional heterogeneity examined in this study, and are widely applicable to phylogenetic analyses of various empirical datasets. For more information, please refer to Ishikawa, Inagaki, and Hashimoto. (2012) listed in my CV.

Computational challenges for the efficient parallelization of phylogenetic inferences with non-homogeneous models, on current supercomputing systems

Recent advances in genome sequencing techniques enable us to phylogenetically analyze large matrices composed of hundreds of genes derived from diverse organisms. Such ‘phylogenomic analyses,’ however, are often influenced by the heterogeneity of base or amino-acid composition, codon usage, and substitution rate across genomes, or even within a genome. Non-homogeneous (NH) models are supposed to be critical to ameliorate the artifact from above systematic biases in phylogenomic analyses. Nevertheless, phylogenomic analyses have been conducted almost exclusively under homogeneous models for two reasons. Firstly, phylogenetic inferences based on NH models can be computationally much more intensive than homogeneous models, because the former models require an enormous amount of model parameters to be optimized. Secondly, all of the currently available phylogenetic codes, which are applied novel parallel computing techniques using a pile of CPUs (and GPUs), only implement homogeneous models. Therefore, it is urgent to build a new phylogenetic program incorporating efficient parallel computing methods with NH models.

For this computational effort, I have collaborated with the laboratory for High Performance Computing Systems in University of Tsukuba, aiming to parallelize a phylogenetic program, ‘NHML’, which implements a NH model that allows the AT content to vary across lineages (Galtier and Gouy, 1998). A fine-grained parallelization by OpenMP was applied to the calculation of site-wise log-likelihoods (site-lnLs) for a given tree, while a coarse-grained parallelization by Message Passing Interface (MPI) was applied to the computation of alternative trees during the ML tree search based on the SPR method. In addition to this ‘Hybrid’ parallelization, I newly implemented a medium-grained parallelization by MPI—during the lnL calculation for a given tree, optimization of model parameters (e.g., equilibrium AT content on each branch), as well as branch lengths, can be assigned to different groups of MPI processes in parallel. The performance of the ‘multi-grained’ parallelization on NHML was benchmarked by analyzing simulation datasets including ~130 species and ~10,000 nucleotide positions. Consequently, I achieved suitable speedup (i.e., parallel efficiency >= 0.5) of the maximum-likelihood tree inference up to 64 computational nodes and 1,024 CPU cores on a supercomputer system, ‘T2K-Tsukuba’ (http://www.top500.org/system/176215) in Center for Computational Sciences, University of Tsukuba.

Detection of gene conversion (recombination) events among bacterial sequences, based on the phylogenetic methods

Bacteria have two paralogs of peptide-chain release factor, RF1 and RF2, which are different from each other in stop-codon recognition. The two RF families are generally expected to have taken independent evolutionary paths after they arose from a single gene-duplication event in the ancestral bacterial genome. However, my survey based on phylogenetic and statistical methods detected inter- or intra-genomic conversions between RF1 and RF2 genes in diverse bacterial genomes, which encompass a domain that has a key role in the interaction with the ribosome during translation termination process. Structural analyses suggested that conversions of the corresponding region are functionally neutral for both RF1 and RF2, implying that the frequency of ‘partial’ conversion between paralogous genes is higher than we generally assume. For more detailed information, please check Ishikawa, Kamikawa, and Inagaki (2015) listed in my CV.

Collaboration for the large-scale phylogenetic analyses

In addition to the main research themes mentioned above, I have collaborated with a number of evolutionary biologists and worked on the global phylogeny of eukaryotes. Particularly, I had strong contribution in two big projects to elucidate the evolutionary affiliations of two novel microbial eukaryotes, Tsukubamonas globosa and Palipitomonas bilix. I took the initiative in operating the 157-protein-based phylogenomic analyses to determine the positions of T. globosa and P. bilix in the global phylogeny of eukaryotes. I also engaged in statistical analyses to investigate underlying systematic errors (e.g., long branch attraction, compositional biases, covarions).

Projects

CV

EDUCATION

Ph.D., Graduate School of Life and Environmental Sciences , University of Tsukuba [April 2012 – March 2015]

M.Eng., Graduate School of Systems and Information Engineering, University of Tsukuba [April 2012 – March 2014 (Dual-Degree program)]

M.Sc., Graduate School of Life and Environmental Sciences, University of Tsukuba, [April 2010 – March 2012]

B.Sc., College of Biological Sciences, University of Tsukuba [April 2006 – March 2010]

EMPLOYMENT

Research Fellow of the Japan Society for the Promotion of Science (PD), University of Tokyo and Institut Pasteur as host institutions [April 2016 – March 2019, scheduled]

Postdoctral Researcher at Center of Bioinformatics, Biostatistics and Integrative Biology, Institut Pasteur, France Institut Pasteur, France [March 2016]

Research Fellow at Faculty of Life and Environmental Sciences, University of Tsukuba [April 2015 – February 2016]

Research Fellow of the Japan Society for the Promotion of Science (DC1), University of Tsukuba as a host institution [April 2012 – March 2015]

GRANTS

Japan Society for the Promotion of Science, Grant-in-Aid for JSPS Fellows (PD). “Evolution of the ‘community genomes’ of the human gut microbiome via frequent transfers of functional genes”.  [April 2016 – March 2019, scheduled]
Project number: XXX
Role: PI, Amount: XXX

Japan Society for the Promotion of Science, Grant-in-Aid for JSPS Fellows (DC1). “Large-scale
phylogenetic analyses for the diversity and the origin of fish parasites in Parabodonidae”. [April 2012 – March 2015]
Project number: 24007
Role: PI, Amount: 2,700,000 JPY

TEARCHING EXPERIENCE

2011 Teaching Assistant, “Molecular Evolution”, University of Tsukuba
2010 Teaching Assistant, “Molecular Evolution”, University of Tsukuba

MEMBERSHIPS

2013 – Society of Systematic Biologists
2009 – Society of Evolutionary Studies, Japan
2009 – 2010 Japanese Society of Phycology

PUBLICATIONS

Peer-reviewed Journal Papers
†: Equally contributed authors

  1. Templeton T, Asada M, Jiratanh M, Sohta A. Ishikawa, Tiawsirisup S, Sivakumar T, Namangala B, Takeda M, Mohkaew K, Ngamjituea S, Inoue N, Sugimoto C, Inagaki Y, Suzuki Y, Yokoyama N, Kaewthamasorn M, Kaneko O. (2016), Ungulate malaria parasites. accepted to be published in Scientific Reports
  2. Sohta A. Ishikawa, Ryoma Kamikawa, Inagaki Y. (2015), Multiple conversion between the genes encoding bacterial class-I release factors. Scientific Reports, 5:12406.
  3. Kamikawa R, Tanifuji G, Sohta A. Ishikawa, Ishii K, Matsuo Y, Onodera N, Ishida K, Hashimoto T, Miyashita H, Mayama S, Inagaki Y. (2015), Proposal of a Twin-arginine translocator system–mediated constraint against loss of ATP synthase genes from nonphotosynthetic plastid genomes. Molecular Biology and Evolution, 32(10):2598–2604.
  4. Sohta A. Ishikawa, Nakao M, Inagaki Y, Hashimoto T, Sato M. (2014), MPI/OpenMP HYBRID Parallelization of Phylogenetic Analyses based on Non-Homogeneous Substitution Models:Implementation and Performance Evaluation for Large-Scale Computing Systems. IPSJ Transactions on Advanced Computing Systems, 7(3), pp 13–24 (2014). written in Japanese
  5. Yabuki A†, Kamikawa R†, Sohta A. Ishikawa, Kolisko M, Kim E, Tanabe AS, Kume K, Ishida K, Inagaki Y. (2014), Palpitomonas bilix presents a basal cryptist lineage: insight into the character evolution in Cryptista. Scientific Reports, 4:4641.
  6. Kamikawa R, Kolisko M, Nishimura Y, Yabuki A, Brown MW, Sohta A. Ishikawa, Ishida K, Roger AJ, Hashimoto T, Inagaki Y. (2014), Gene-content evolution in discobid mitochondria deduced from the phylogenetic position and complete mitochondrial genome of Tsukubamonas globosa. Genome Biology and Evolution, 6(2), pp 306-315.
  7. Nagayasu E, Sohta A. Ishikawa, Taketani S, Chakraborty G, Yoshida A, Inagaki Y, Maruyama H. (2013), Identification of a bacteria-like ferrochelatase in Strongyloides venezuelensis, an animal parasitic Nematode. PLOS ONE, 8(3), e58458.
  8. Sohta A. Ishikawa, Hashimoto T. (2012), Assessment of the performance of phylogenetic inference based on simulated protein-coding sequences with significant compositional heterogeneity. Proceedings of the Institute of Statistical Mathematics, 60(2), pp 289-303. written in Japanese
  9. Sohta A. Ishikawa, Inagaki Y, Hashimoto T. (2012). RY-coding and non-homogeneous models can ameliorate the maximum-likelihood inferences from nucleotide sequence data with parallel compositional heterogeneity. Evolutionary Bioinformatics, 8, pp 357-371.
  10. Ishitani Y†, Sohta A. Ishikawa†, Inagaki Y, Tsuchiya M, Takahashi K, Takishita K. (2011), Multigene phylogenetic anaylses including diverse radiolarian species support the “Retaria” hypothesis – the sister relationship of Radiolaria and Foraminifera. Marine Micropaleontology, 81(1), pp 32-42.
  11. Matsumoto T, Sohta A. Ishikawa, Hashimoto T, Inagaki Y. (2011), A deviant genetic code in the green alga-derived plastid in the dinoflagellate Lepidodinium chlorophorum. Molecular Phylogenetics and Evolution, 60(1), pp 68-72.
  12. Reimer JD, Sohta A. Ishikawa, Hirose M. (2011), New records and molecular characterization of Acrozoanthus (Cnidaria: Anthozoa: Hexacorallia) and its endosymbionts (Symbiodinium spp.) from Taiwan. Marine Biodiversity, 41(2), pp 313-323.

Peer-reviewed Conference Papers

  1. Sohta A. Ishikawa, Nakao M, Inagaki Y, Hashimoto T, Sato M. (2014), Hybrid MPI/OpenMP parallelization of a phylogenetic program with Non-Homogeneous models: toward the analyses of large-scale sequence datasets. High Performance Computing Symposium 2014, pp 10-20. written in Japanese