Ecology of Marine Microbes

Sarahi Garcia
Jessika Füssel
Florian Trigodet

Table of Contents

Preface

The purpose of this document is to share the details of the lectures and hands-on exercises for the “Ecology of marine microbes (EMM)”. In the following sections you will find an hour-by-hour plan for our activities, learning objectives we will cover, as well as suggested reading material and datasets. All the other key information is at the very end of the page to ensure quick access to the schedule – if this is the first time you are looking at this document, please take a look at those sections as well.

Every lecture will take place in the main conference room of the new HIFMB Building at Im Technologiepark 526129 Oldenburg.

You will need a personal laptop (not a tablet) for the entire duration of this course.

Week 1

Day 1, Monday (01/06/26)

The first day introduces the foundational questions of microbial ecology and provides the computational entry point for the course.

09:00 - 09:30: Course Logistics

Discussion over what will happen throughout the next weeks and beyond. A great time to take a look at the course syllabus online together:

https://merenlab.org/courses/EMM/

09:30 - 10:00: Installation check

Making sure everyone has R, RStudio, a terminal and anvi’o installed on their computers. For the anvi’o installation, make sure to install the ‘development version’ and not the v9. In addition to the installation, please follow the instructions to download some key resources (see 6.1 Setup key resources).

10:00 - 11:00: A brief introduction to microbial diversity

We begin by exploring how scientists study microbial diversity, focusing on three guiding questions:

  • Who is there?
  • What are they doing?
  • How are they doing it?

We will discuss how these questions are addressed using modern molecular approaches, and emphasize the assumptions, limitations, and interpretations that shape our understanding.

11:00 - 12:00: A brief introduction to High Performance Computing

We introduce the computational environment used throughout the course. Students will gain access to the high-performance computing system (ROSA) and learn the basics required to begin working with large-scale microbial data.

  • Learning Objectives:

    • Access and navigate the HPC environment (ROSA)
    • Execute basic commands in a remote computing environment

13:00 - 17:00: Introduction to the Swedish lakes dataset and metaWRAP

In this exercise, we will begin working with real metagenomic data to reconstruct microbial genomes. We will use metagenomes from a freshwater dataset: Rodríguez-Gijón et al., 2023. Scientific Data.

These data will serve as a starting point to explore how genomes can be reconstructed from environmental sequencing data. The metagenomes are already available on ROSA.

We will follow a modified version of the metaWRAP pipeline (the script is in the file script.html).

Learning Objectives:

  • Explain how bins (MAGs) are generated from assembled data
  • Execute a metagenomic workflow using metaWRAP

Note: You won’t finish this exercise today, but will continue working on it the rest of this week and submit the final table by the end of the week.

Day 2, Tuesday (02/06/26)

The day of genome size and microbial life strategies.

09:00 - 09:30: Genome Size and Microbial Life Strategies

Genome size varies widely across Archaea and Bacteria, but this variation is not random. In this session, we will explore how genome size relates to microbial ecology and how it reflects different evolutionary and ecological strategies.

10:00 - 17:00: Exploring genome size across microbial lineages and ecosystems

In this exercise, we will explore how genome size varies across microbial groups and environments using a large comparative genome dataset. We will use the dataset compiled for the study Rodríguez-Gijón et al., 2022. Front. Microbiology.

This exercise introduces genome size as an ecological trait and provides practice in data exploration and visualization using R. We will examine how genome size varies across taxonomic groups, ecosystem types, and genomic features such as GC content. You will use the R script genomes_visualization_tutorial.R.

Learning Objectives:

  • Inspect and subset a genomic dataset in R
  • Identify broad patterns in genome size across taxa and environments
  • Generate density plots, scatter plots, and violin plots in ggplot2
  • Interpret genome size as a genomic and ecological trait

This exercise is designed both as an introduction to R-based visualization and as a conceptual entry point into microbial life strategies. Focus not only on producing figures, but also on asking what biological interpretation the figures support – and what they do not.

Day 3, Wednesday (03/06/26)

This day is dedicated to consolidating the concepts and workflows introduced in Days 1 and 2.

09:00 - 12:00: Continuation of exercises

Students will continue working on:

  • Genome-resolved metagenomics using metaWRAP
  • Data exploration and visualization in R

13:00 - 15:00: Student presentations

The second part of the day focuses on communicating results. Each student (or pair) will present:

  • Research question
  • Approach and code
  • Key figures
  • Interpretation

Presentations should be concise and focused on the link between data and biological interpretation.

Presentation Guidelines:

  • Clearly state your research question
  • Show your code and figures in RStudio
  • Focus on figures and what they reveal
  • Discuss limitations and assumptions
  • Keep presentation within the allocated time
  • No PowerPoint is allowed

This session is an opportunity to transition from guided exercises to independent thinking. Focus on clarity. A simple, well-explained analysis is more valuable than a complex but unclear one.

Day 4, Thursday (04/06/26)

The day of Chlorobia ecology.

09:00 - 09:30: Introduction to the Chlorobia dataset

Introduction to Chlorobia: who they are, their ecology, and what type of data we can use to study their distribution in lakes.

10:00 - 17:00: R tutorial – Vertical distribution of Chlorobia

Guided analysis using phyloseq to explore the distribution of planktonic Chlorobia across lakes and depth. You will use the R script abundances_visualization_tutorial.R.

The following tables are needed for this exercise:

Learning Objectives:

  • Work with abundance, taxonomy, and metadata tables
  • Visualize Chlorobia distribution across lakes
  • Explore depth-resolved patterns
  • Relate abundance patterns to oxygen

Reference article:

You will start working on the project for Day 5.

Day 5, Friday (05/06/26)

The day of independent projects and presentations.

09:00 - 13:00: Independent project work

Time to continue working on:

  • Genome-resolved metagenomics using metaWRAP
  • The final R project

For the final project, you will work individually or in groups of two to develop your own research question using the data from Rodríguez-Gijón et al., 2023. Scientific Data.

You will also use the additional table SwedishLakes_abundances.csv.

Please follow these conditions:

  • Your research question must be answered with 1-2 figures created in R
  • The figure(s) must be completed as if prepared for publication

14:00 - 17:00: Student presentations

All groups or individuals will present their research question and results to the class. Please present your research question and figures in RStudio using the projector.

Each student (or pair) will present:

  • Research question
  • Approach and code
  • Key figures
  • Interpretation

Presentations should be concise and focused on the link between data and biological interpretation.

Presentation Guidelines:

  • Clearly state your research question
  • Show your code and figures in RStudio
  • Focus on figures and what they reveal
  • Discuss limitations and assumptions
  • Keep presentation within the allocated time
  • No PowerPoint is allowed

This session is an opportunity to transition from guided exercises to independent thinking. Focus on clarity. A simple, well-explained analysis is more valuable than a complex but unclear one.

Week 2

Day 6, Monday (08/06/26)

The day of data-driven strategies to survey environmental microbiomes and genome resolved metagenomics.

09:00 - 11:00: An overview of data-driven strategies to survey environmental microbiomes

11:00 - 15:00: A read recruitment exercise to warm up

The purpose of this exercise is to help you have a direct exposure to individual analysis steps and tools that enables one to recruit reads from metagenomes, and profile the read recruitment results to investigate gene distribution patterns of a given population.

Throughout this exercise you will use a mock dataset to (1) familiarize yourself with commonly used file formats such as FASTA, FASTQ, SAM, and BAM, (2) learn the basic steps of read recruitment through Bowtie2 and samtools, (3) learn how to profile read recruitment results using anvi’o, and (4) familiarize yourself with downstream steps of the analysis of recruited reads. Please try to be mindful about individual steps, make notes of those steps that did not make much sense to you so we can discuss them further if we have time.

You will find the exercise here: https://merenlab.org/tutorials/read-recruitment/

15:00 - 17:00: Genome-resolved metagenomics: opportunities and pitfalls

Day 7, Tuesday (09/06/26)

The day of pangenomics and metapangenomics.

09:00 - 10:30: Pangenomics: comparative genomics in the era of genomic explosion

10:30 - 12:00: Pangenomic analysis of a bacterial genus

This is a small exercise with pangenomics. Please download the data pack for this exercise at this Dropbox link.

The data pack contains 15 genomes for you to work with. While each genome belongs to the bacterial genus Bifidobacterium, you don’t know which species they assign. Please take a look at the anvi’o pangenomics tutorial and/or the pangenomics exercise to find out how to create a pangenome for all these 15 genomes using the program anvi-pan-genome with default parameters, and answer the following questions in your short report:

  • How many single-copy core genes did you find?
  • When you organize genomes based on gene cluster frequencies, how many main groupings of genomes do you observe?
  • Which ‘species’ name would you annotate these genomes with?
  • According to gene clusters, which two species of Bifidobacterium in this mixture are most closely related?

Please include a screenshot of your final display you achieved through anvi-display-pan, and get cookie points for your pretty displays :)

Some optional questions for the overly enthusiastic:

  • What are some of common features of the genomic islands that seem to be variable across individual genomes in this pangenome? Tip: you can have quick insights into genomic islands that occur only in some genomes by organizing gene clusters based on enforced synteny per genome.
  • What functions seem to differ between the main groups of genomes? Tip: you can use functional enrichment analyses to figure out if there are functions that systematically occur in one clade of Bifidobacterum but not the other.

13:00 - 15:00: Pangenomics analysis - continued

15:00 - 17:00: Metapangenomics: integrated interpretations of pangenomes and metagenomes

Day 8, Wednesday (10/06/26)

The day of phylogenomics.

09:00 - 10:30: Phylogenomics: inferring evolutionary relationships between microorganisms

10:30 - 12:00: Phylogenomic analysis of a bacterial genus

This is a small exercise in phylogenomics. Please use the same data pack from the pangenomics exercise to complete this one. Since you already have your contigs-db files for the genomes in that data pack, this should be extremely fast for you. But please start early to avoid any last minute challenges :)

To solve this exercise, please apply phylogenomics principles to calculate a tree for the Bifidobacterium clade.

You can benefit from the tutorial on anvi’o phylogenomics workflow and see examples on how to get the necessary genes from genomes for phylogenomics. Reconstructing a final tree for these genomes with phylogenomics, and being able to explain why you have made certain choices to generate it, is the answer to this exercise.

Once you are done, please compare your phylogenomic tree to the dendrogram you have obtained from the pangenomic analysis. If you want to get fancy, feel free to include ‘additional’ Bifidobacterium genomes from other species in this genus :)

Day 9, Thursday (11/06/26)

The day of metabolism.

09:00 - 12:00: Inferring microbial metabolism in genomes and metagenomes

13:00 - 17:00: Comparative microbial metabolism

This is a small exercise in microbial metabolism analysis. Please find the data pack for this exercise on at this Dropbox link.

The data pack contains four microbial genomes, and your task is to investigate which of these organisms (if any) are capable of nitrogen cycling. Please use anvi’o to annotate these genomes with KOfams, and then run anvi-estimate-metabolism to calculate the completeness of metabolic pathways in the KEGG MODULE database. You should examine the output of that program to identify the completeness scores for nitrogen cycling pathways in each genome. You will find a list of all KEGG modules for nitrogen metabolism at this link. This list contains seven pathways for nitrogen fixation, nitrate reduction, denitrification, and nitrification.

Your short report should answer the following questions:

  • Which nitrogen metabolism pathways are ‘complete’ in each genome? Please include in your answer their path-wise completeness scores and the score threshold that you are using (ie, the value of the --module-completion-threshold parameter).

  • For the nitrifying organisms, which of the two nitrification reactions – the first conversion from ammonia to nitrite, or the second conversion from nitrite to nitrate – can they do? What evidence supports this?
  • When you’ve analyzed all of the genomes, please summarize your findings with a few sentences describing the following points:
  • which part(s) of the nitrogen cycle you found to be complete, and which part(s) were missing across all genomes
  • which genome(s) were capable of carrying out multiple nitrogen metabolism pathways, and which genome(s) had no nitrogen metabolism capabilities at all
  • other observations or hypotheses (if you have any) about these nitrogen cycle pathways, or the enzymes/gene annotations in these pathways, or why these genomes might have these capabilities or not, etc

And here are some optional things to include in your report, if you have the time or interest :)

  • Determine the taxonomic identity of each genome. Does the genome’s metabolic capacity match to what you would expect, based on known research about its taxonomic clade?
  • Visualize the metabolism estimation results across the four genomes as a heat map, and add a screenshot of the heat map to your report. You can find examples of how to create the heat map in the tutorials linked below (but feel free to use a different way to do it, too)

You might find some of the resources below helpful as you do this exercise:

Day 10, Friday (12/06/26)

The day of microbial population genetics.

09:00 - 10:30 Microbial population genetics: tools, terminology, and open questions

10:30 - 12:00 :: Structure-informed interpretations of microbial population genetics

13:00 - 15:00: Population genetics of a cryptic plasmid

This is a small exercise on microbial population genetics. The exercise aims to help you familiarize yourself with the population genetic signal recovered from metagenomes through single nucleotide variants, and sharpen your ability to answer some key questions using such data. You can download the data pack from here, in which you will find an anvi’o profile database and a contigs database that contains all the data you will need to be able to solve the following puzzle.

The contigs database is generated from a single plasmid, and the merged profile database contains the metagenomic read recruitment data that puts this plasmid in the context of 12 human gut metagenomes. The gut metagenomes are a subset of the data published in this study in case you are interested to take a look. But briefly, the subset of the data that is profiled here includes 6 gut metagenomes from mothers, and 6 gut metagenomes from their infants. But you don’t know the real infant-mother pairs :)

Your task is to investigate single-nucleotide variants (SNVs) found in read recruitment results to and answer the following questions:

  • As far as this dataset goes, would one argue that the plasmid is acquired from random sources upon birth, or is there evidence to suggest it is vertically transmitted from mothers to infants?
  • If it is vertically transferred, can one identify mother infant pairs confidently?

To answer these questions you can get inspiration from strategies mentioned in this tutorial. If you want a refresher on SNVs, you may want to take a look at this blog post.

You can (and should) inspect the coverage plots for all of the mothers and infants (using the program anvi-interactive), but if you determine that the plasmid is vertically transmitted and you think you can identify mother-infant pairs, you are invited to create a final figure that summarizes the evidence for it.

If you believe there is signal to determine the answer for it, please try to figure out which mother matches which infant and be prepared to prove your conclusion!

15:00 - 17:00: Open lab

Discussions, revisiting old topics, and preparations for the next week.

Weeks 3 and 4

During these weeks, you will transition from guided exercises to independent research. Using the datasets and approaches introduced throughout the course, you will design and develop your own research question and address it through data analysis and visualization.

This phase emphasizes:

  • Independent thinking
  • Data-driven reasoning
  • Clear scientific communication

June 15 – June 24: Independent work on your project

During this period, you will:

  • Define and refine your research question
  • Analyze data using the tools introduced in the course
  • Generate figures that support your conclusions

June 25: Presentation preparation

You will use this day to:

  • Organize your results
  • Prepare your presentation

June 26: Symposium

Each student (or group) will present their project to the class.

  • 15 minutes presentation
  • 20 minutes of discussion and questions

Project Expectations

Your project should be based on a clear and testable research question derived from the datasets explored in the course.

You are expected to:

  • Use appropriate analytical approaches
  • Generate 1-3 publication-quality figures
  • Interpret your results in an ecological context

Presentation Guidelines

Your presentation should clearly communicate the work you have developed during the course.

  • i. Background: Provide context for your study and explain why the question is relevant
  • ii. Research question: Clearly state the question you aimed to address
  • iii. Methods and rationale: Describe the analytical approach and explain why it was appropriate
  • iv. Results: Present your key findings using figures
  • v. Conclusion: Summarize your interpretation and discuss implications and limitations

This stage of the course is designed to consolidate all previous components: conceptual understanding, computational skills, data analysis, and interpretation.

Focus on clarity and coherence. A well-defined question and a clear answer are more valuable than a complex but unfocused analysis.

Faculty and Communication

The following table lists individuals who will be involved in the course, and their contact information:

Name Role Contact information
Sarahi Instructor sarahi.garcia@uni-oldenburg.de
Jessika Füssel Instructor jessika.fuessel@uol.de
Florian Trigodet Instructor florian.trigodet@hifmb.de
Samuel Hürten Supervisor Week 3-4 samuel.huerten@uni-oldenburg.de
Chandni Sidhu Supervisor Week 3-4 chandni.sidhu@uni-oldenburg.de
Anis Hosseini Teaching Assistant anis.hosseini@uni-oldenburg.de
Ghazaleh Sheikhi Ghahi Teaching Assistant ghazaleh.sheikhi.ghahi@uni-oldenburg.de

Description and Learning Objectives

The oceans are home to many microorganisms. In fact, the number of microbial cells in the oceans outnumber the stars in the known Universe. These countless microorganisms constitute slightly over half of the total biomass in the marine environment, playing a crucial role in maintaining the delicate balance of biogeochemical cycles on Earth.

In our course, we delve into the fundamentals of computational approaches that now grant unprecedented access to these communities through innovative ‘omics strategies. Acquiring a comprehensive understanding of these strategies, including their appropriate applications and limitations, has become an essential skill for any aspiring life scientist. The primary objective of this course is to empower participants to explore the ecology, evolution, and functionality of naturally occurring microbial populations, while grasping the current conceptual framework that aids our comprehension of the most diverse life forms on our planet.

Over the span of two week, our course unfolds with a series of lectures and practical exercises, acquainting participants with the foundational concepts of omics strategies. They will delve into the theoretical foundations of prominent ‘omics data types and their contemporary uses, encompassing genomics, metagenomics, metatranscriptomics, and various ‘omics data analysis methodologies such as genome reconstructions from metagenomes, general visualization strategies for omics large datasets, metabolic reconstruction in genomes and metagenomes, metagenomic read recruitment, pangenomics, phylogenomics, and microbial population genetics.

Moving into the third week, students will embark on small-scale research projects conducted in groups of up to three students. Each group will be presented with or will develop research questions concerning the ecology of marine microorganisms. Collaboratively, students, alongside instructors and supervisors, will strategize and devise methodologies to address these research questions. Following the designed strategies, students will then execute their plans to find answers to the research questions. Finally, they will present their findings to the other groups, fostering a dynamic exchange ofinsights and perspectives.

The learning objectives of the course includes the following:

  • To apply state-of-the-art ‘omics approaches to various data types to make sense of complex datasets.
  • To engage with research questions and learn about designing strategies to answer the research questions.
  • To practice a different set of analytical or computational methods to answer research questions regarding microbial ecology.
  • To improve discussion, analytical, presentation and writing skills.

Prerequisites

  1. To maximize benefit, the participants of this course are expected to be familiar with the central dogma of molecular biology, and able to answer what is a gene, a genome, a transcript, or a protein, and have at least a preliminary understanding of the principles in ecology and evolution, such as the basics of taxonomy and broad ecological principles that maintain complex ecosystems.

  2. Throughout the course we will use anvi’o for ‘omics analyses. Anvi’o is an open-source software platform that brings together many aspects of today’s cutting-edge computational strategies of data-enabled microbiology, including genomics, metagenomics, metatranscriptomics, pangenomics, metapangenomics, phylogenomics, and microbial population genetics in an integrated and easy-to-use fashion through extensive interactive visualization capabilities. Anvi’o is cited over 1,000 times in the literature and is actively maintained. It is a requirement for participants to have access to personal computers, install anvi’o software and bring their computers with anvi’o installed to the classroom. If you need a computer for loan, the University can arrange that, just contact the course coordinator.

  3. The participants will also engage with R. Install R and RStudio (for all operating systems). If you run into problems installing this stuff please contact it@icbm.de (if possible attach a screenshot of the error message).

  4. Install MobaXterm (for Windows users only). MobaXterm is a simple Windows program to connect to remote Linux computers.

  5. The participants of this course are also expected to be familiar with the UNIX shell (also known as the ‘terminal environment’, or ‘command line interface’). Many of the students would have taken the course Programming for Life Scientists. However, if you have no prior experience with the command line interface, that is OK, as you will generate those skills throughout the course as the vast majority of data analyses we will do will take place in the command line interface. Arguably, the exposure to the command line environment and developing a level of mastery of it will be one of the most impactful gains you will have from this course that will help you throughout your professional journey almost regardless of which career path you choose that involves data; so if you are not familiar with the command line environment, see this as an opportunity to invest time into developing some skills in it. You can use some of the following material to familiarize yourself with the command line interface.

    • Beginner’s Guide to the Bash Terminal (a video introduction to the Linux command line environment – although Joe Collins is talking about Linux, the topics are relevant to anyone who uses a command line environment and we strongly recommends everyone to watch this in its entirety, and try to replicate commands).
    • Learning the Shell (a chapter from the open book “The Linux Command Line” by William Shotts – highly recommended).
  6. The course will require its participants to read and understand contemporary literature written in English.

Attendance Policy

Each participant is expected to attend each lecture in person (unless a legitimate reason for absence that is recognized by the University is in effect).

Evaluation and Grading

The evaluation in this course will be based on five parts of a portfolio.

Part I (10% of your grade)

The first week of the course consist of lectures and exercises that familiarize you with the terminal, r and r studio, as well as asking research questions and designing code that helps you visualize answers to your questions using large datasets. This week there will be 4 small projects that when accomplished you will get 10% of the grade.

Part II (10% of your grade)

For the second week, the grade is based on class attendance and it will be recorded by a strategy we call class citizenship, which aims to help the course director to have an overall understanding of the evolution of the course.

The class citizenship demands every participant to send a class citizenship email to jessika.fuessel@uol.de, florian.trigodet@hifmb.de, anis.hosseini@uni-oldenburg.de and ghazaleh.sheikhi.ghahi@uni-oldenburg.de at the end of each day. The class citizenship email must be composed of two parts:

  • A brief summary of the main concepts discussed during the day, interpreted by the attendee in their own words.

  • One or more short questions that is/are relevant to concepts or ideas discussed throughout the day, yet remained unclear.

The last 15 minutes of every day will be dedicated to class citizenship emails, therefore the attendees will end their day without having to remember doing it later.

The title of the class citizenship email must follow this pattern word-by-word:

EMM Class Citizenship: DD/MM/YY

For instance, the following would be the appropriate title for this email for the first day:

EMM Class Citizenship: 01/06/26

The best class citizenship emails are those that are brief, genuine, and insightful. In an ideal world the emails should be no less than 50 words, and no more than 250 words. Please do not send notes you take throughout the class. You should use the last 15 minutes of the lecture to gather your thoughts, and come up with a summary of what you can remember. Here is an example class citizenship email:

Summary: Today we discussed what is phylogenomics, how phylogenomic trees are built, and why single-copy core genes are suitable for building phylogenomics trees. We also discussed the relationship between phylogenetics, phylogenomics, and pangenomics with respect to the fraction of genome used and the evolutionary distance that they can cover.

Question: Since phylogenomics and pangenomics are both useful for inferring evolutionary distances, it seems to me that integrating both methods in a systematic way would yield a more reliable tree. But it looks like the field only uses phylogenomics and pangenomics separately, is there a reason for that?

Part III (30% of your grade)

Starting on week 3 you will be divided into groups, and you will start working towards a methodological strategy to answer research questions. During the following 8 days you will be planning and executing data collection or data analysis. All these methods, and results must be compiled in a journal. A copy of this journal must be handed in by June 24th. This journal will be then graded by your course supervisor.

Part IV (10% of your grade)

This part of your grade relates to teamwork commitment and will be evaluated by both your supervisor and your team-mates.

Part V (40% of your grade)

The biggest part of your grade will consist of a presentation to be given on June 26th. In this presentation you will as a group explain to the rest of the class the project you have worked during the course. The presentation will be maximum 15 minutes and must contain:

  • i. background to your project
  • ii. research question
  • iii. methods and methods rationale
  • iv. results
  • v. conclusion

You will have the entire day on June 25th to work with your team and finish and polish the presentation. The presentation will be followed by up to 20 minutes of questions.

The grading scale for this module is as follows:

Grade Threshold
1.0 95%
1.3 90%
1.7 85%
2.0 80%
2.3 75%
2.7 70%
3.0 65%
3.3 60%
3.7 55%
4.0 50%