October 3rd, 2016

What is a computer?

Summary of last week

Is a general purpose device that can

• read, process and write numbers
• (and things that can be represented by numbers)
• to and from the memory
• following a program stored also in the memory
• many simple steps

Changing the program changes the purpose of the machine

This week

• How information is coded in numbers
• How these numbers are stored and organized
• How we interact with computers
• Start using an specific tool: RStudio

The smallest information piece

The most simple answer to a question is yes or no

When we bet on a tossed coin, what do we know?

This elementary information unit is called bit (binary digit)

It can be represented by on/off, true/false, 0/1, etc.

Binary representation

For technical reasons modern computers handle only packs of 8 bits

That is called a byte and can represent a number in the range 0 to 255

Bigger numbers

binary representation

• Using two bytes we can represent numbers between 0 and 65535
• 16 bits
• How? If $$x$$ and $$y$$ are two bytes, we can evaluate $x+256 y$
• One byte has the small part, the other has the big part

Bigger numbers

binary representation

The idea can be extended to using 4 bytes (32 bits)

0 to 4 294 967 295

and also to using 8 bytes (64 bits)

From 0 to 18 446 744 073 709 551 615 $1.8466 \cdot 10^{19}$

Negative numbers

With a small modification we can also represent signed numbers.

• The first bit to represent the sign
• The rest of the bits represent the number

Example: Sound

• Sound is transformed into electricity by a microphone.
• The voltage is measured 44100 times each second
• Each sample is stored as a number in a CD

Two steps: sampling (in time) and discretization (in voltage)

Example: Greyscale Image

• Each “point” has a value between 0 (black) and 255 (white)
• correct name is pixel picture element
• they are stored line by line

Floating point

Using scientific notation we write $1.8466 \cdot 10^{19}$ Using the same idea we can use two numbers like this $$x\cdot 2^y$$

There are two versions: single and double precision

They use 4 and 8 bytes, respectively

Floating point standard

Notice that this approach has some limitations

Not all numbers are represented exactly

Can also represent special values

• Inf: Positive Infinity, 1/0
• -Inf: Negative Infinity, -1/0
• NaN: Not a number, 0/0
• NA: Not Available, missing data

Homework: Memory size

How much can we store in the computer?

What is the size of the memory of your computer?

What is the size of the disk?

The memory (RAM) is like a desk. The disk is like a bookshelf.

Kinds of file

At low level there is only one type of file

For us, it is useful to separate in two:

Text Files:
each byte is a character, we can read it
Binary Files:
bytes are grouped in binary numbers, representing images, sounds, etc.

Among binary files we have EXE files, which are programs for Windows

Representing text

The most natural way to represent a text document is to encode each letter with a single byte

There is a basic standard for English, called ASCII

Each number from 0 to 127 is either a symbol or a special signal, such as

• New Line
• End of Message
• Tab
• Space
• Backspace

ASCII code

30 40 50 60 70 80 90 100 110 120
0 ( 2 < F P Z d n x
1 ) 3 = G Q $| e | o | y 2| | | 4|\>| H| R|\\| f | p | z 3| !| +| 5| ?| I| S|$ g q {
4 " , 6 @ J T ^ h r |
5 # - 7 A K U i s }
6 \$ . 8 B L V  j t ~
7 % / 9 C M W a k u
8 & 0 : D N X b l v
9 ´ 1 ; E O Y c m w

Numbers between 128 and 255 are not used in ASCII

Non English languages use these values for symbols like “Ç”, “Ö”, “É”, “Ñ”

Text Files

• are universal
• are easy to read and write from a program
• do not have any style like bold or italic
• are like books without figures

Microsoft Word files (doc or docx`) are NOT text files

You shall not use Word for this course

Text files are for humans and computers

• Binary files are hard to read
• unless you have the correct program
• Text files can be read by humans
• Each byte is a letter
• Text files can be read by computers
• Data must be recyclable
• The output of one program is the input of another program

Structure in Data

Today we will focus on a key idea.

To understand the data we need structure

For example, folders in the disk are a hierarchical structure.

Structured documents

Text documents also have a logical structure

• Letters form words
• Several words become phrases and paragraphs
• Paragraphs are contained in sections and chapters
• Sometimes we have lists of elements
• Sometimes we have tabular data
• Figures
• References to other works

The problem

Ordinary word processors are based on the WYSIWYG (What You See Is What You Get) philosophy

Users are encouraged to change fonts, sizes, colors and other visual attributes

Separation of form and content

Writing and formatting at the same time is distracting.

The idea is to write first, and format later, as close as possible to the time of publication.

• WYSIWYG: What You See Is What You Get
• Microsoft Word
• WYMIWYG: What You Mean Is What You Get
• The information you enter defines the meaning of the document
• The program generates beautiful output