Class 7: Structured Documents

Methodology of Scientific Research

Andrés Aravena, PhD

March 24, 2022

To understand complex things, we organize them using structures

The most common structures are

  • Tables: several things with the same set of attributes
  • Hierarchies: like files and folders, or taxonomy
  • Networks: social, biologic, communications, etc.

Text documents have hierarchical structure

We can see it in papers, thesis, etc

  • Title
  • Sections
    • Subsections
      • Lists
      • Figures
      • Tables
  • References to other works

Two versions of the same paper

Two versions of the same paper

Let’s take a look at the paper

Searls DB. “Ten simple rules for online learning”. PLoS Computational Biology. Epub 2012 Sep 13.

Find it at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3441493/

Version 1: PubReader format

Version 2: Plain text format

What is the difference?

First version has text and style

It is a rich text document

Title, author, and sections are marked with

  • size
  • weight
  • color

Second version has only text, no style

This is a plain text document

The text is the same

There is no style

Plain text have no structure

It is hard to find the document parts

  • What is the title?
  • Who is the author?
  • What are the sections?

Decoration marks structure

For example

Change font style: Bold, italics
Change font size: smaller, larger
Change alignment:

centering

You could do it with an old typewriter

This is how it was done in 1946

(this is the first description of modern computers)

You can also do it in Word®

This is what most people do

The problem

Style is not Structure

In word processors like Word®,
What You See Is What You Get

This is sometimes called WYSIWYG

It is easy to change fonts, sizes, colors and other visual attributes, without paying attention to structure

It is like a house

Structure makes the house solid.

If you only do decoration, the house looks nice but it is not solid.

Wall structure come first, “painting” is secondary

Signs that something is wrong

  • You spend too much time choosing fonts and colors
  • The style is incoherent
  • Changing style is a lot of work
  • It is hard to make a Table of Contents
  • You use the mouse a lot

Separation of form and content

Writing and styling at the same time is a bad idea

The idea is to write first, and style later

Choose the style just before publishing

What You Mean Is What You Get

  • The information you enter defines the meaning of the document
    • You mark the structure
  • The program generates beautiful output
    • Style can change

Many versions of the same paper

One version on PubMed Central

It can be in “PubReader” Format

It can be in PDF format

Or in the PLOS website

All have the the same structure

even if they have different styles

How to write a structured document

Separation of concerns

You are scientist, not a graphics designer

  • You read papers
  • You plan and do experiments
  • You analyze the results
  • You write the papers and reports
  • You do not choose fonts, colors, or sizes

Focus in the Structural elements

  • Sections
  • Subsections
  • Normal Paragraphs
  • Lists
    • Numbered
    • Unnumbered
  • Computer code

Collaboration

Collaborating

Sharing Word documents by email is a VERY BAD IDEA
It leads to chaos and confusion

Use an Online service

You can share your document via Dropbox or Google Drive

You can edit online using Microsoft Office 365 or Google Docs

Several people can work in the same document at the same time

Advantage: better spelling and grammar correction

But they require a permanent internet connection