UNIT 04 - Big Data

EXPECTATIONS: Before we start this unit you should be comfortable with the following:

  • Data is stored in structures
  • Encrypted data is not casually viewable
  • "Nefarious actors" are present throughout the web
  • Massive data files exist on citizens and consumers

LEARNING GOALS -- At the conclusion of this unit, I will be able to :

      1. Document the history of computers from the 1950's room-sized boxes to mini-computers and micro-computers
      2. compare and contrast floppy drives, hard drives, RAID Arrays, RAM, ROM (including PROM & EPROM)
      3. Accurately describe "Big Data"
      4. explain and/or sketch data "striping"
      5. generally explain data parity bits relating to data recovery
      6. Analyze my personal Google archive
      7. Explain the characteristics of a phishing attack
      8. Explain the characteristics of a DOS/DDOS attack
      9. Eplain the characteristics of the Stuxnet WORM
      10. Explain the methods and flaws exploited in the Equifax hack
      11. Explain the methods and characteristics of a Ransom-Ware attack
      12. Describe plausible storage, security, or privacy concerns for particular pieces of data
      13. Explain why encryption is an important need for everyday life on the Internet
      14. Crack a message encrypted with a Caesar cipher using a Caesar Cipher Widget
      15. Crack a message encrypted with random substitution using Frequency Analysis
      16. Create an encryption key, send an encrypted message and have a colleague decrypt my message with that key
      17. Explain the weaknesses and security flaws of basic ciphers
      18. Describe AES encryption to a casual user



    • CPU - Central Processing Unit (often called the Processor) is the 'brains' of a computer. It controls the recovery of data from the storage device as well as when and how that information is processed in RAM
    • Hard Drives: Data is stored magnetically on a spinning platter and stored as a series of 0's and 1's. Operates at millisecond time scales (10-3 second)
    • RAM: Random Access Memory. Used for processing information. Operates at nanosecond (10-9 second) time scales.
    • ROM: Read Only Memory. Used by manufacturers to store critical processing information at a specific, addressable place. Data on the ROM chip cannot be erased, modified or otherwise changed.
    • PROM: Programmable Read-Only Memory. Created empty by manufacturers in bulk and sold to companies. Those companies then "burn in" information on the chip that cannot be changed thereafter.
    • EPROM: Erasable Programmable Read-Only Memory. Existing data on the chip can be erased with a burst of UV light.
    • EEPROM: Electronically Erasable Programmable Read-Only Memory. Existing data on the chip can be erased using specifically timed pulses of electricity.


    • Big Data - a broad term for data sets so large or complex that traditional (single computer) data processing applications are inadequate.
    • Moore's Law - a predication made by Gordon Moore in 1965 that computing power will double every 1.5-2 years, it has remained more or less true ever since.
    • Server Software - Software dedicated to a single purpose for multiple users such as file servers, database servers, web servers etc often/usually running on a dedicated server computer, computers or sometimes racks of computers:
      • Database Server Software: Used to store, add, edit and update data stored on servers
        • Oracle - VERY expensive
        • Microsoft SQL Server - Just plain expensive
        • MySQL (Free/Open Source) owned by Oracle
      • File Server Software: Used to store individual files for employees, students etc...
        • Microsoft Windows Server (roughly 1/3 of the market)
        • Linux (Free/Open Source (about 2/3 of the market)
        • Unix (less than 1% of the market and fading fast)
      • Email Server Software: Many organizations and corporations have migrated their email away from their own servers to the 'cloud'
        • Microsoft (market share ?)
        • Google (market share ?)
      • Programming Language Server Software: Allows specific programming language to execute as part of a website
        • Microsoft Visual Basic
        • PHP
        • Others?
      • Web Hosting Server Software: Allows an individual, group, corporation, government or organization to host web sites
        • Amazon Web Services
        • Apache (Free/Open Source)
        • Microsoft Azure (?)

Types of Computers:

    • Original computers (1950's - 1960's): Room Sized computing devices the size of a small room that required strict environmental controls
    • Mini-Computers (1970's - present): Scaled-down computers that were the size of a BIG desk. Accessed through "dumb" terminals
    • Micro-Computers: (1980 - present) Computers that sit on top of our desks (desktops, laptops, chromies)
    • SuperComputers (1970's - present). High powered, high speed machines used for weather forecasting and nuclear bomb simulations requiring massive numbers of variables. Typically used by large universities and governments. They have been and remain VERY expensive.
    • Servers (Computers usually dedicated to a single purpose for multiple users such as file servers, database servers, web servers etc...)

Data Storage:

    • Mirroring - A data storage process where all data is written to separate and distinct locations. If the data in one location is lost or damaged, the data on the other side is instantly available.
    • RAID Array - Redundant Array of Independent Disks. Allows for linking multiple hard drives together to act as a single data storage device
    • Rack - A cabinet holding multiple servers each typically holding multiple hard drives
    • Parity - A process where a small amount of data is appended to data as it is written to a disk or disk array which allows for recreating lost data (very quickly).
    • Striping - The process of writing data across multiple hard drives to increase performance

Sample Hacking/Attack Schemes:

  • DOS/DDOS - Denial Of Service (or Distributed Denial of Service) attack occurs when a nefarious actor sends massive/repeated "Ping" or other commands to targeted servers in an attempt to crash that server
  • Phishing - A process where a nefarious actor(s) tries to gain access to a corporate/organization/personal computer or network by sending disguised email(s) loaded with nasty code to individuals in that organization
  • Worm - A computer program that self-replicates and is easily transmitted via the internet and especially via USB drives.


  • Caesar Cipher - a technique for encryption that shifts the alphabet by some number of characters
  • Cipher - the generic term for a technique (or algorithm) that performs encryption
  • Cracking encryption - When you attempt to decode a secret message without knowing all the specifics of the cipher, you are trying to "crack" the encryption.
  • Decryption - a process that reverses encryption, taking a secret message and reproducing the original plain text
  • Encryption - a process of encoding messages to keep them secret, so only "authorized" parties can read it.
  • Random Substitution Cipher - an encryption technique that maps each letter of the alphabet to a randomly chosen other letters of the alphabet.

Little Bo Peep Has Lost Her Sheep

and Radar Cannot Find Them

They'll all (face to face)

Meet in parallel Space

Preceding Their Leaders Behind them