103.3 - Database and CSV Intro
A database is just a bunch of stored data.
A photo is a visual database. It stores data related to a time and place in history.
With this example you can see that you should not limit your concept of databases to "tables" as has been traditionally done.
Now, on to digital databases...
The data stored in digital Databases is most traditionally text (a string of characters put together) or numeric (a number). These often times get further categorized into more specific subcategories like varchar, nvarchar, date, time, etc, but aren't the just all characters? Sometimes links to image files might be stored, but those are just some text, as are links to other files or website pages.
Databases can in fact also store image files and data files in raw form (binary form), but that is only for the big guys, and that is not something we will get into. You store your image files and digital files in raw form on a hard drive or a cloud drive and you will view those via a file manager -- hey wait.. then doesn't that just make your hard drive one big database of digital data? Yes, basically it does, and your file manager then is in fact just your database manager.
For digital databases, a basic understanding of tables, rows, columns and Comma Separated Values (CSVs) is very important. This terms and these concepts are the foundational vocabulary and structures used to organize data for database purposes.
Coincidentally, that same understanding of tables, rows and columns will help you navigating around a spread sheet, and that is one reason why spreadsheets make for a great simple database management system (for certain applications).
CSV understanding becomes very relevant as you try to import and export data from one resource to another. It is the poor mans way of convey amazing amounts of complex data in an utterly simple manner. The ways people have tried to complicate what can be done with Comma Separate Values and other delimiters never ceases to amaze me.
If you begin to use the FREE Bookkeeping System we've made a available online (ZAP Accounting Software), if you want to import data into that as opposed to hand typing every transaction, you will be downloading CSV files from your banking institution, so understanding the basics of databases and CSV files has lots of application and relevance, and it's just not that complicated.
This page includes the following sections:
A photo is a visual database. It stores data related to a time and place in history.
With this example you can see that you should not limit your concept of databases to "tables" as has been traditionally done.
Now, on to digital databases...
The data stored in digital Databases is most traditionally text (a string of characters put together) or numeric (a number). These often times get further categorized into more specific subcategories like varchar, nvarchar, date, time, etc, but aren't the just all characters? Sometimes links to image files might be stored, but those are just some text, as are links to other files or website pages.
Databases can in fact also store image files and data files in raw form (binary form), but that is only for the big guys, and that is not something we will get into. You store your image files and digital files in raw form on a hard drive or a cloud drive and you will view those via a file manager -- hey wait.. then doesn't that just make your hard drive one big database of digital data? Yes, basically it does, and your file manager then is in fact just your database manager.
For digital databases, a basic understanding of tables, rows, columns and Comma Separated Values (CSVs) is very important. This terms and these concepts are the foundational vocabulary and structures used to organize data for database purposes.
Coincidentally, that same understanding of tables, rows and columns will help you navigating around a spread sheet, and that is one reason why spreadsheets make for a great simple database management system (for certain applications).
CSV understanding becomes very relevant as you try to import and export data from one resource to another. It is the poor mans way of convey amazing amounts of complex data in an utterly simple manner. The ways people have tried to complicate what can be done with Comma Separate Values and other delimiters never ceases to amaze me.
If you begin to use the FREE Bookkeeping System we've made a available online (ZAP Accounting Software), if you want to import data into that as opposed to hand typing every transaction, you will be downloading CSV files from your banking institution, so understanding the basics of databases and CSV files has lots of application and relevance, and it's just not that complicated.
This page includes the following sections:
- Intro to Tables, Rows, Columns and CSV
- Example of a Data Table with Rows and Columns in CSV Format
- Example of a Data Table with Rows and Column Names in CSV Format
- Delimiters other than Comma ( ; || or whatever you want!! )
- Example of a Data Table in Graphical Format in Google Sheets
1) Intro to Tables, Rows, Columns and CSV
Historically and generally speaking, data is divided into things called "data tables". Data tables are in fact multi-dimensional arrays, but we visually represent those as two dimensinal grids, like graph paper, but with larger cells.
While we could refer to a data tables by an unique number (1 for the first one we created and 2 for the second one, etc) Table Names are assigned to tables for ease of reference.
Each Data Table is a matrix consisting of rows and columns.
Most frequently, a record (a row of data, which consists of separate bits of data referred to as columns or fields) was conveyed from one digital platform to another in CSV format (Comma Separated Value Format).
While we could refer to a data tables by an unique number (1 for the first one we created and 2 for the second one, etc) Table Names are assigned to tables for ease of reference.
- Names of tables which are self-explanatory might be: Customers, Orders, Students, etc
Each Data Table is a matrix consisting of rows and columns.
- A row contains values related to a given "record".
- A column defines distinct bits of data that made up the record. A column is also often referred to as a "field" or a "database field". The term field implies something big and expansive. I have no clue the origin of that term, but it is confusing to newbies...
Most frequently, a record (a row of data, which consists of separate bits of data referred to as columns or fields) was conveyed from one digital platform to another in CSV format (Comma Separated Value Format).
- Comma separated values format is exactly what it sounds like.
- Comma's are used as delimiters between parts of data related to a single record
- An example of comma separated values for a Customer record is: Bob, Jones, 1234 Artichoke Lance, Castroville, CA, 95012 -- where the data between the commas is First Name, Last Name, Address Line 1, City, State, Zip
- (and this works fine for voluminous amounts of data as long as a comma is NOT part of the discrete data being separated -- if it is, things called "escape characters" are used to tell the parser to ignore that delimiter )
2) Example of a Data Table with Rows and Columns in CSV Format
- The Data Table name is "Customers"
- A Record in Comma Separated Values is: Bob, Jones, 1234 Artichoke Lance, Castroville, CA, 95012
With such a nomenclature, if I made a request to the database for column 3, the database would return "1234 Artichoke Lane" and anyone familiar with the table structure would realize column 3 = Street address. However anyone unfamiliar with the database may not know exactly what this is (although obviously in this case, it is a street number and name for something)
If there were multiple records, I could ask for column 2 from record 10, and with this type of request system, you can see how a simple matrix can be used to store and retrieve data.
While this system works, it requires the referencing of columns by number and there's not a clear method for referencing rows other than by number -- and what happens if those get shifted around for some reason?
3) Example of a Data Table with Rows and Column Names in CSV Format
Adding a list of Column Names as labels for columns makes interacting with a data table easier.
- The Data Table name is "Customers"
- A Column Name list (aka field name list) in Comma Separated Values is: First Name, Last Name, Address Line 1, City, State, Zip (where each comma denoted a new column)
- A Record in Comma Separated Values is: Bob, Jones, 1234 Artichoke Lance, Castroville, CA, 95012
- With such a nomenclature, if I made a request to the database for City, the database would return "Castroville" (and the correlation is made because "City" is in the 4th position in it's own list, and thus the 4th position in the referenced record is returned.
- Field names are generally a better system for labeling columns because if you decide to change the order of of the columns, as long as you are using a field name, you don't need to change anything in your request. You simply reference the field name and the database system will find it's index number in real time and then return you the corresponding data you requested.
So, for example, if for some strange reason someone wanted to change the order of the Field Names to First Name, Last Name, ZIP, Address Line 1, City, State and the data went with it... (Bob, Jones, 95012, 1234 Artichoke Lane, Castroville, CA) If I requested City, I'd still get "Castroville" even though it had actually shifted from column 4 to column 5.
4) Delimiters other than Comma ( ; || or whatever you want!! )
Comma is the most common delimiter, but it really is arbitrary. Other delimiters that are used include the semicolon (;) and the double bar (||) often referred to as "pipes". Anything can be used as a delimiter as long as 1) those decoding it are told what it is and 2) the delimiter does not show up as part of the data in the column (and/or if it does, escape characters are used to mark it and those decoding it know about the escape characters).
5) Example of a Data Table in Graphical Format in Google Sheets
CSV data files are files that end in .csv. They are in fact just text files. If you have a plain old text editor you can open the files and you will see the field names separated by commas and then the subsequent rows will just be rows of data with commas separating the column data (each new row would represent a new record).
This day in age, if you have a .csv file and double click on it, it will likely open up by default into a spreadsheet and if it doesn't, you can go into the spread sheet application and select open file and you will be able to open the data file directly in your spread sheet (typically -- there is a strange bug in Google Sheets which has been reported that prevents it from opening some CSV files. You can identify the ones it won't open because they are blue without white squiggles).
The image below is representative of what you might see if you opened a .csv file with the table name, field names and record data we've presented above.
This day in age, if you have a .csv file and double click on it, it will likely open up by default into a spreadsheet and if it doesn't, you can go into the spread sheet application and select open file and you will be able to open the data file directly in your spread sheet (typically -- there is a strange bug in Google Sheets which has been reported that prevents it from opening some CSV files. You can identify the ones it won't open because they are blue without white squiggles).
The image below is representative of what you might see if you opened a .csv file with the table name, field names and record data we've presented above.
The "Tab" at the bottom called "Customers" is representative of the "data table" name. The field names are in row 1. The data record is in row two. Technically I could refer to the columns by a number (like column 1 or column 2), a letter (like column A or column B) or by the field name which is denoted in the first row (first name or last name).
You can imagine that if we added 10 or 100 or 1000 more customers to our sheet/table, we would have a "Customer Database" (aka a Customer List or a Customer Table).
Please note, a "true database software package" might in some ways look similar to a spreadsheet, but it will NOT allow you to move data around in the cells like we've learned to do on a todo list.. so while they visually look similar, using a spreadsheet for a database table has both it's pros and cons. There are times to do this and times not to do this and generally speaking the time to do this is for smaller, less complex data storage needs where there are only one or a few people accessing data and all are educated on how they could corrupt the data.
You can imagine that if we added 10 or 100 or 1000 more customers to our sheet/table, we would have a "Customer Database" (aka a Customer List or a Customer Table).
Please note, a "true database software package" might in some ways look similar to a spreadsheet, but it will NOT allow you to move data around in the cells like we've learned to do on a todo list.. so while they visually look similar, using a spreadsheet for a database table has both it's pros and cons. There are times to do this and times not to do this and generally speaking the time to do this is for smaller, less complex data storage needs where there are only one or a few people accessing data and all are educated on how they could corrupt the data.