# Methodology of Scientific Research

## Easy numbers to remember

π 22/7 3
$$\sqrt{10}$$ 3.16 3
seconds/day 86400 1E5
weeks/year 52 50
days/year 360 1000/3
E.coli genome 4.5E6 bp 5E6 bp

## Exercise: Multiple sequence alignment

To identify the function of a gene, one strategy is to

• compare it to similar genes, and
• identify the polymorphisms

This is called Multiple Sequence Alignment

Aligning $$N$$ sequences of length $$m$$ requires $$m^N$$ individual comparisons

## How many seconds

To fix ideas, let’s assume that $$m=1000$$
(That is a typical size for a bacterial gene)

Now assume that the computer can do one million comparisons each second

What is the time needed to align $$N$$ sequences?

## Exercise 0

How many seconds will it take to align 2, 4, 8, and 12 sequences?

## How much is that?

Under these hypothesis we have this table

$$N$$ Seconds In words
2 $$10^0$$ 1 sec
4 $$10^6$$ 1 million seconds
8 $$10^{18}$$ 1 trillion/quintillion seconds
12 $$10^{30}$$ a lot of time

## Exercise 1

Translate these numbers to days, years, etc.

## Exercise 2

How do these numbers change if $$m$$ changes?

## Exercise 3

What happens if the computers are 1000 times faster?

## Exercise 4

What is the largest multiple alignment that you can do in your life?

## Exercise 5

What is the largest number of sequences that can be aligned?

## Heuristic

This is clearly too expensive, so we need heuristics

(i.e. solving a similar but simpler problem)

There are fast Multiple Sequence Alignment methods, but they are approximate

We are doing something similar with our way of calculating

## How to estimate values

In other words, come up with a reasonably close solution.

2. If you can’t estimate the answer, break the problem into smaller pieces and estimate the answer for each one

We have already done the first approach. Today we do the second

This part follows the text of
“Guesstimation: Solving the World’s Problem” by Lawrence Weinstein and John A. Adam

# Divide and Conquer

## This is a “Fermi problem”

This is a classic example originated by Enrico Fermi
(in the 1930’s)

It is used at the beginning of many physics courses, because

• it requires the methods and reasoning used in science
• does not need any physics concepts.

## How to answer this question

This is a complicated problem. We cannot just estimate the answer

To solve this, we need to break down the problem

We need to estimate

1. how many pianos there are in Los Angeles
2. how many pianos each tuner can care for

How would you do it?

## To estimate the number of pianos, we need

1. the population of the city

2. the proportion of people that own a piano

3. the number of schools, churches, etc. that also have pianos

## Number of pianos each tuner can care for

we need to estimate

1. how often each piano is tuned

2. how much time it takes to tune a piano

3. how much time a piano tuner spends tuning pianos

## Estimating the population of Los Angeles

• It must be much less than 108
• since the population of USA is 3 × 108
• It must be much more than 106
• since that is the size of an ordinary big city
• We estimate it at 107

## Proportion of pianos per person

Pianos will be owned by individuals, schools, and houses of worship

• About 10% of the population plays a musical instrument
• it’s surely more than 1% and less than 100%
• At most 10% of musicians play the piano
• not all of them own a piano
• the proportion that own a piano is probably 2–3% of the musicians
• This would be 2 × 10−3 of the population

## Institutional pianos

• There is about one house of worship per thousand people
• each of those will have a piano
• There is about one school per 500 students
• or about 1 per 1000 population
• each of those will have a piano

This gives us about 4×10−3 pianos per person

Thus, the number of pianos will be about $$10^7×4×10^{−3} =4×10^4$$

## How often each piano is tuned per year

Pianos will be tuned less than once per month and more than once per decade

We’ll estimate once per year.

## How much time it takes to tune each piano

It must take much more than 30 minutes and less than one day to tune a piano (assuming that it is not too badly out of tune)

We’ll estimate 2 hours

Another way to look at it is that there are 88 keys

• At 1 minute per key, it will take 1.5 hours
• At 2 minutes per key, it will take 3 hours

## How much time each piano tuner works per year

A full-time worker works

• 8 hours per day
• 5 days per week
• 50 weeks per year

which gives 8 × 5 × 50 = 2000 hours

In 2000 hours she can tune about 1000 pianos

## Update

Do you think these values are still valid?

How do you think these values changed?

Why?

# Where is everybody?

In 1950, at the Los Alamos National Laboratory, four scientists (Emil Konopinski, Edward Teller, Hebert York, and Enrico Fermi) had a casual conversation about flying saucers during lunch

This quickly turned into a discussion about the possibility of sophisticated societies populating the universe

During the discussion, Enrico Fermi came out with this casual remark

“Where is everybody?”

## Fermi’s reasoning

Herbert York wrote in 1984 that Fermi “followed up with a series of

• calculations on the probability of earth-like planets,
• the probability of life given an earth,
• the probability of humans given life,
• the likely rise and duration of high technology
• and so on

He concluded on the basis of such calculations that we ought to have been visited long ago and many times over”

## Drake equation (1961)

$N = R_* \cdot f_\mathrm{p} \cdot n_\mathrm{e} \cdot f_\mathrm{l} \cdot f_\mathrm{i} \cdot f_\mathrm{c} \cdot L$

where

• $$N$$ = the number of civilizations in our galaxy with which communication might be possible
• $$R_{∗}$$ = the average rate of star formation in our Galaxy
• $$f_{p}$$ = the fraction of those stars that have planets
• $$n_{e}$$ = the average number of planets that can potentially support life per star that has planets

## Drake equation (cont…)

$N = R_* \cdot f_\mathrm{p} \cdot n_\mathrm{e} \cdot f_\mathrm{l} \cdot f_\mathrm{i} \cdot f_\mathrm{c} \cdot L$

where

• $$f_{l}$$ = the fraction of planets that could support life that actually develop life at some point
• $$f_{i}$$ = the fraction of planets with life that actually go on to develop civilizations
• $$f_{c}$$ = the fraction of civilizations that develop a technology that releases detectable signs of their existence into space
• $$L$$ = the length of time for which such civilizations release detectable signals into space

## “Educated guesses” by Drake et al

• $$R_{*}$$ = 1 yr-1 (1 star formed per year, on average)
• $$f_{p}$$ = 0.2 to 0.5 (1/5 to 1/2 of all stars will have planets)
• $$n_{e}$$ = 1 to 5 (stars with planets will have between 1 and 5 planets capable of developing life)
• $$f_{l}$$ = 1 (100% of these planets will develop life)
• $$f_{i}$$ = 1 (100% of which will develop intelligent life)
• $$f_{c}$$ = 0.1 to 0.2 (10–20% of which will be able to communicate)
• $$L$$ = 1000 to 100,000,000 communicative civilizations (which will last somewhere between 1000 and 100,000,000 years)

## Initial results

With the lowest initial guesses we get a minimum N of 20

The maximum numbers gives a maximum of 50,000,000

This varies a lot depending on the hypothesis

Question: How can we know the range of values for $$N$$?