Good Practices

A talk for Barcelona Supercomputing Center’s Ph.D. students

Andrés Aravena, PhD

October 6, 2023

Welcome to Ph.D. student’s life

Congratulations, by the way

Doing a Ph.D. is an amazing experience

But it can also be stressful

We want to present some ideas on how to (successfully) survive your Ph.D. 

Some of them are endorsed by research

Some of them are personal opinions

(informed by shared experiences)

I am Andres Aravena

  • Assistant Professor at Molecular Biology and Genomics Department, Istanbul University
  • Mathematical Engineer, U. of Chile
  • PhD Informatics, INRIA–U Rennes 1, France
  • PhD Mathematical Modeling, U. of Chile
  • Research interests:
    • machine learning for metagenomics and antibiotic resistance
    • statistical analysis of gene expression for systems biology

Focus on Philosophy, not Tools

Tools will change in time. There will be new tools

You probably use tools that did not exist 10 years ago

And they often are a matter of personal taste

So we will focus on the philosophy of the tools

(i.e. the part that will not change)

Why do we need
good practices?

We need good practices, because

our mind fools us

We think we will never forget, but we do

“I remember it now, therefore I will remember it forever”

When we see something or learn something, this fact is present in our short-term memory and we feel like we will always remember it

We forget that we forget

Solution: Use a journal (or lab notebook, or blog)

We think our memories correspond to facts, but often they do not

“Things were exactly as I remember”

Research shows that our memory is not at all a “recorder”

We misremember a lot

Solution: Use a journal

We are bad at estimating projects’ complexity

We think that we can finish a project in less time that it will really take

Solution:

  • Write in your journal how much time you worked every day
    • Tools like Toggl Track can also be used
  • Reflect on how did you use your time

Good practice 1

Use a journal or a lab notebook

Laboratory notebook

In experimental sciences we record every experiment in a paper notebook

  • What is the purpose of the experiment
  • What is the expected output
  • What was the result, positive or negative
  • What were the lessons learned

Kanare, H. M. (1985). Writing the laboratory notebook. American Chemical Society.

Logbooks and Commonplace books

In the navy it is a standard practice to log everything

It was the 18th century version of a plane’s black box

It was also typical for writers to carry a notebook to write notable extracts from texts

This was called a Commonplace book

Some other people used to write a personal journal or diary

Bullet journal

Nowadays there is a fashion of using a combination of logbook, commonplace, journal, and to-do list

A bullet journal (BuJo) is

  • paper based: easy to carry, not distracting
  • numbered pages: easy to index and reference
  • reviewed daily

Don’t believe the fancy BuJo you see on the web. They do not need to be beautiful

Just get a simple notebook and visit https://bulletjournal.com/

Carroll, Ryder. 2018. The Bullet Journal Method: Track the Past, Order the Present, Design the Future. New York: Portfolio, Penguin.

Key Idea

  • Do not trust your memory
    • “Your mind is for having ideas, not holding them”
  • Write how you solved each problem every day
  • Write what you learned every day
    • “Today I learned…”
  • Keep an index for easy retrieval
  • Review your notes periodically and reflect
    • “Have a conversation with your past/future self”

First quotation is from
Allen, D. (2015). Getting things done: The art of stress-free productivity. Penguin Books.

More reasons for good practices

(other ways our mind fools us)

We think that everybody knows what we know, so they do not need explanations

This is the curse of knowledge

“I understand it, so everybody understands it”

It is the main reason why our text is hard to read

Solution: This one I’m still trying to figure out. Practice.

We think that everything we do is easy

This is Impostor Syndrome

“I’m not really that good, and one day they will realize I don’t know anything”

We learn a little every day, so it never feels hard

But we accumulated learning in a large period,
and it is hard to see how much we have learned

Solution: Look at your journal and reflect on how much have you learned in the last year

We don’t know that we don’t know

This is the Dunning-Kruger effect

“Incompetent, and unaware of it”

It is hard to improve if we don’t know we are bad

Solution: Be open to criticism of your work

Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology, 77(6), 1121–1134.

You are not your work

Two sides of the coin

Impostor Syndrome and Dunning-Kruger effect are mismatches between self-perception and other people’s vision of us

To solve that, we can improve our Communication with colleagues and collaborators

A Ph.D. goal is to produce and communicate new knowledge

(we call it “Doing Science”)

The key word here is communicate

What is the value of a result that is not made public?

Keyword 1: Communication

We communicate with our collaborators

Most of research is done in teams

Good practices help teamwork, by:

  • Keep track of what was (or was not) done
  • Coordinate next steps
  • Avoid work duplication

…but I work alone…

Even if we work alone, we are still communicating

  • with your supervisor or advisor
  • with the referees of your paper
  • with other scientists that read (and cite) you
  • with the next Ph.D. student in your lab
  • with the general public
  • with our future self

Each one of these interactions can improve following a good practice

Communicate with your supervisor

Research results are not enough

You must convince your boss (and the jury) that you deserve to be called “Doctor”

  • Make your work easy to understand

  • Make clear what is your original contribution

…with the referees of your paper

Referees are busy people who works for free

  • Give them all they need to replicate and validate your work

  • Being clear and transparent helps them to decide fast

You will get published faster
(or at least get good feedback)

…with other scientists in your field…

…that will read your paper (and hopefully cite it)

The game does not end when you publish

50% of papers are read only by the referee

  • Make your work easy to understand and replicate

Evans, J. A. (2008). Electronic Publication and the Narrowing of Science and Scholarship. Science, 321(5887), 395–399.

…with the general public

Eventually, your work will have an impact outside academia

(the end goal is to make a better world, no?)

We need to be aware of the ethical implications

  • Access, licensing, copyright models
  • Privacy concerning test subject
  • Truth and academic integrity

…with your future self

Nothing is more frustrating that reading your old work

As they say: “The past is a foreign country”

Undocumented code/protocols are hard to understand…

and you can only blame yourself

Email

(also applies to WhatsApp, Slack, etc.)

Essential parts

An email should provide just enough information to answer these five questions:

  • Who are you?
  • What do you want?
  • Why are you asking me?
  • Why should I do what you’re asking?
  • What is the next step?”

Guy Kawasaki, cited in Vozza, Stephanie. 2013. ‘Productivity Lifesaver: The 5-Sentence Email’. Entrepreneur. https://morideno.com/write-five-sentences-about (October 3, 2023).

Consider time zones

If you collaborate with people abroad, remember that your 10am may not be their 10am

Sometimes your “tomorrow” is not their “tomorrow”

Be explicit on the weekday, the date and the time

Use GMT/UTC based timezones.
Other abbreviations are ambiguous

  • AMT is Armenia Time or Amazon Time
  • CEST = CEDT = ECST = MESZ = UTC+2

Be Explicit

Not too long

“Long emails are either unread or, if they are read, they are unanswered … Right now I have 600 read but unanswered emails in my inbox.”

Guy Kawasaki, cited by Stephanie Vozza in
‘Productivity Lifesaver: The 5-Sentence Email’

Entrepreneur website. https://morideno.com/write-five-sentences-about (October 3, 2023).

Five sentences

“A Disciplined Way To Deal With Email”

E-mail takes too long to respond to, resulting in continuous inbox overflow for those who receive a lot of it.

Treat all email responses like SMS text messages, using a set number of letters per response. Since it’s too hard to count letters, we count sentences instead.

five.Sentenc.es http://www.five.sentenc.es/

Implementing “five sentences”

Write this as your signature

--------------------------------------------
Q: Why is this email five sentences or less?
A: http://five.sentenc.es

See also

Make it easy to notice

When someone gets many emails,

they decide which ones to read based on:

  • Who sent it
  • What is it about

That is, based on your name and the subject

Does this work?

Always write a Subject

The Subject should say why to read the message

  • Good: short and to the point

    “Want to introduce my colleague. Coffee Tuesday or Wednesday?”

  • Bad examples:

    “(No subject)”, “message”, “hello”

You can even say everything in the subject

“We wait for you at classroom 1 [EOM]”

Here “[EOM]” means “[End Of Message]”

This shows that theres is nothing more to say

All the message is in the subject

No need to open the email

What about this one?

Have I seen this person before?

Choose well your picture

Most people are much better at recognizing faces than names

Some email platforms allow you to show your picture

(also applies to WhatsApp and similar apps)

Your picture should show your face clearly

And about this one?

Always include your full name

Don’t make people guess.
Write your name the way you want to be called

Bad if too short or too long:

  • Pablo
  • Pablo Diego José Francisco de Paula Juan Nepomuceno María de los Remedios Cipriano de la Santísima Trinidad Ruiz y Picasso

Good if is the name you like people to call you

  • Pablo Picasso

“Pablo Picasso.” (2023). In Wikipedia. https://en.wikipedia.org/wiki/Pablo_Picasso

Write it backwards

It is easy to press SEND before attaching a file
or before writing the subject

A good way of never forget them is to

  1. Attach any attached file
  2. Write the text explaining about the attached file
  3. Write a one-phrase summary as subject
  4. Write the recipient’s email address
  5. Press SEND

You cannot press SEND until you write the recipient’s email

Attachments

Email was designed for text. Plain text

It cannot handle “binary” (non-text) data

To attach a picture/document, it is encoded as text

This increases the file size by 33%

Use attachments only if necessary

Worst offenders: short Word files, which could be copied-and-pasted in the email body

Use instead a shared folder in the cloud
(more on that later)

Exception: To leave an explicit trace of a given document at a fixed date

(for example, students’ homework)

Collaborating

Collaborating

Sharing Word documents by email is a VERY BAD IDEA
It leads to chaos and confusion

Use an Online service

You can share your document via Dropbox or Google Drive

You can edit online using Microsoft Office 365 or Google Docs

Several people can work in the same document at the same time

Advantage: better spelling and grammar correction

But they require a permanent internet connection

Where to store it

  • In the server only

  • Cloud drive like Dropbox, Google Drive

    • Good to share large data and non-text files
    • Bad if two people changes the same file
    • Works better with permanent internet access
  • Version control system like GitHub, GitLab, Bitbucket

    • Good for text and code, bad for big files
    • Keeps history
    • works well without internet access

Never use Git in a shared folder

It can easily become corrupt

Sharing

  • Hybrid, using symbolic links

  • Or use an online editor

    • Google Docs
    • HackMD.io
    • Overleaf

Folder structure

Prepare your files for the next user

Someone unfamiliar with your project should be able to look at your computer files and understand in detail what you did and why

 

The ideas of this section are mostly based on
William Stafford Noble. “A Quick Guide to Organizing Computational Biology Projects.” PLoS Computational Biology 5, no. 7 (2009): 1–5. https://doi.org/10.1371/journal.pcbi.1000424.

This “someone” could be

  • someone who wants to try to reproduce your work,
  • a collaborator who wants to understand your experiments,
  • a future student in your lab extending your work
    • after you have moved on to a new job,
  • your research advisor evaluating your research skills.

Most commonly, however, that “someone” is you.

William Stafford Noble. “A Quick Guide to Organizing Computational Biology Projects.” PLoS Computational Biology 5, no. 7 (2009): 1–5.

Everything you do, you will probably have to do over again

Folder structure for data projects

Role of each folder

  • docs is where you write your paper/talk/thesis
  • data is anything that you get from outside the computer
  • results is what your code produces
  • code is where you write your code
  • bib to store documents cited in your document
    • if it has a doi, it goes here
    • bibliographic database goes here
  • extra for other documents without doi

Use a script to build the structure

Cookiecutter is a python tool to create new projects

You can find search for recipes in GitHub with a query like topic:cookiecutter topic:r

Raw Data is Sacred

Producing data is expensive and time consuming

You don’t want to lose it. Mark it read only immediately
(and make backups)

Never modify raw data. Use a script to make a clean version

Use folders raw and clean inside data/YYYY-MM-DD
Code for that in scripts

Each folder needs a README file

Good filenames help a lot to understand the project

But they are usually not enough

A README file in each folder can explain the purpose of each file

It takes time to write them, but it saves time in the long run

Define your projects

What is a “project”?

We can distinguish four categories

  • Projects with well-defined goals and deadlines, e.g. a thesis
  • Areas that are permanently active, like “health” or “family”
  • Resources that can be useful for several projects, like code libraries, or general interest papers
  • Archives, anything that is no longer active. Can be copied to external media and stored out of the computer

Each one requires a separate folder

Tiago Forte Building a Second Brain, Simon and Schuster, 2022

Spaces

Personally I like to group my Projects/Areas/Resources/ Archives by major topic

  • Teaching
    • Each course is a project
  • Research
  • Work
    • Contracts, bureaucracy
  • Personal
    • Health, Bank, Travel, Family
  • Learning
  • Hobby

Filenames

Be coherent when choosing filenames

Decide when to use ., -, and _

Avoid spaces in filenames

Either John-Smith.txt or John_Smith.txt

Usually . separates filetypes, like .csv or .yml

Define a standard with your collaborators

Check periodically that you are following your standard
(maybe with a script)

Examples

Bad Example

1-Introduction.docx
2_Methods.docx
3.Results.docx
4 discussion.docx
10-conclusions.docx
results-01-03-09.txt

Good Example

01-Introduction.docx
02-Methods.docx
03-Results.docx
04-Discussion.docx
10-conclusions.docx
2009-01-03-results.txt

Another Good Example

01_Introduction.docx
02_Methods.docx
03_Results.docx
04_Discussion.docx
10_conclusions.docx
20090103results.txt

Both are good, but use only one

Write dates as YYYY-MM-DD

  • When was 8/3/1965? August or March?

  • Is today 6/10/2023 or 10/6/2023?

It is better to write YYYY-MM-DD. This is an ISO standard

There is no ambiguity of meaning

Sorting alphabetically, numerically, and chronologically give the same result

Structured Documents

Structured documents

You probably know that using a good data structure can dramatically improve an algorithm

And you use structured programs

The same applies to structuring our documents

Maybe you have used LaTeX, or Markdown

Maybe you know HTML

Separation of concerns

The key idea is to describe what things are, not how they look

Describe the role of text, not the “looks”

Separate style from structure

This part is based on the ideas discussed in “LaTeX: A Document Preparation System” by Leslie Lamport (1986).

It is like a house

Structure makes the house solid and comfortable

If you only do decoration, the house looks nice but it is not solid

Structure of the walls come first

Painting the walls in a nice color is secondary

Structural elements

  • Sections, subsections, paragraphs
  • Figures and Tables
  • Lists
  • References
  • Equations
  • Metadata
    • Title
    • Authors
    • Affiliations
    • Dates: submission, acceptance
    • Media/format

Final comments

Take care of yourself

  • Drink a lot of water
    • Especially when you drink alcohol
  • Get enough sleep
    • Don’t fry your brain, you only have one
  • Try to make a routine. Minimize trivial decisions
    • Save your energy for important things
  • Go for a walk every day

Become a writer

  • Write every day. No exceptions.

    • Start with 150 or 200 daily words
    • Ideal is 750 daily words
  • Once you see yourself as “someone who writes every day”, it will be easy to write papers, projects, thesis, etc.

  • Get addicted to write, as you are addicted to social media

  • Try the Pomodoro technique

Thank you!