STRM/STRC
STRC: A coded version of STR
Standardisation in progress
University of Groningen and the TRESOAR archive have standardised ports of departure, destination and shipmasters home port. By end of 2015 about 50.000 spelling variations of port names have been coded into a 6 letter alfanumeric code.
Standardisation using SQL
The raw data has been coded using Postgresql version 9.4 running on Ubuntu 14.04, while the online verions of the coded data has been imported on a different version of linux running postgres 9.1.
Loading main DATA
Please note that the following documentation of the import uses absolute file names. When using these routines the file location must be changed according to each users directory structure and where the data is downloaded.
Passages
If passages is downloaded from STRO using the query interface a CSV-file called passages_Year_=_YYYY.csv (e.g. YYYY=1754) will be created. The file consist of the selected passsages, first line includes field names, each field is seperated with comma and values are qouted using dobble-qoute ". Exported data is encoded in "Latin1". By mailing the STRO-group it is also possible to get larger chucks of data, however it is encoded in UTF-8, lacks field names and each field are seperated by semi-colon. The field name of all data is doorvaarten.csv. Therefore two import routines have been made, one for each of the supplied data format. Please note that the table definition are the same.
Import the complete data of passages
Passages are imported into a table named doorvaart, primary key is id_doorvaart.
-- Table structure for table 'doorvaart' -- -- DROP TABLE IF EXISTS "doorvaart"; CREATE TABLE "doorvaart" ( "id_doorvaart" INTEGER PRIMARY KEY NOT NULL, "dag" INTEGER, "maand" INTEGER, "jaar" INTEGER, "volgnummer" VARCHAR(255), "schipper_voornamen" VARCHAR(255), "schipper_patroniem" VARCHAR(255), "schipper_tussenvoegsel" VARCHAR(255), "schipper_achternaam" VARCHAR(255), "schipper_plaatsnaam" VARCHAR(255), "soort_korting" VARCHAR(255), "korting_daalder" VARCHAR(255), "korting_skilling" VARCHAR(255), "subtotaal1_daalder" VARCHAR(255), "subtotaal1_skilling" VARCHAR(255), "subtotaal2_daalder" VARCHAR(255), "subtotaal2_skilling" VARCHAR(255), "totaal_rosenobel" VARCHAR(255), "totaal_daalder" VARCHAR(255), "totaal_skilling" VARCHAR(255), "andere_muntsoort1" VARCHAR(255), "andere_muntsoort1_totaal" VARCHAR(255), "andere_muntsoort2" VARCHAR(255), "andere_muntsoort2_totaal" VARCHAR(255), "privilege" VARCHAR(255), "opmerking_bron" VARCHAR(500), "opmerking_invoerder" VARCHAR(255), "opmerking_controle" VARCHAR(255), "controle_1" VARCHAR(255), "controle_2" VARCHAR(255), "controle_3" VARCHAR(255), "naam_invoerder" VARCHAR(255), "datum_opgevoerd" VARCHAR(255), "dirnummer" VARCHAR(255), "tonnage" VARCHAR(255), "belasting_korting" VARCHAR(255), "images" VARCHAR(255), "hulp" VARCHAR(255), "hulp1" VARCHAR(255), "korting_ort" VARCHAR(255), "subtotaal1_ort" VARCHAR(255), "subtotaal2_ort" VARCHAR(255), "totaal_ort" VARCHAR(255), "hulp2" VARCHAR(255) ); COPY doorvaart FROM '/home/arne/STRM/data/doorvaarten.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';
Import selected and downloaded passages
The following code in Postgresql (SQL of the DBMS Postgres) imports the downloaded data into a table (relation) called doorvart. The complete code for importing all three tables, passages, cargoes and taxes can be downloaded here.
-- Table structure for table 'doorvaart' -- -- "id_doorvaart";"dag";"maand";"jaar";"volgnummer";"schipper_voornamen";"schipper_patroniem";"schipper_tussenvoegsel";"schipper_achternaam";"schipper_plaatsnaam";"soort_korting";"korting_daalder";"korting_skilling";"subtotaal1_daalder";"subtotaal1_skilling";"subtotaal2_daalder";"subtotaal2_skilling";"totaal_rosenobel";"totaal_daalder";"totaal_skilling";"andere_muntsoort1";"andere_muntsoort1_totaal";"andere_muntsoort2";"andere_muntsoort2_totaal";"privilege";"opmerking_bron";"opmerking_invoerder";"opmerking_controle";"controle_1";"controle_2";"controle_3";"naam_invoerder";"datum_opgevoerd";"dirnummer";"tonnage";"belasting_korting";"images";"hulp";"hulp1";"korting_ort";"subtotaal1_ort";"subtotaal2_ort";"totaal_ort";"hulp2" -- "410987";"29";"6";"1754";"373";"Gerrit";"";"";"Hendrichs";"Workum";"";"";"";"";"";"";"";"";"2";"";"";"";"";"";"";"";"";"";"J";"N";"N";"ERICB";"31-03-10";"226";"";"";"";"";"";"";"";"";"";"" DROP TABLE IF EXISTS "doorvaart"; CREATE TABLE "doorvaart" ( "id_doorvaart" INTEGER PRIMARY KEY NOT NULL, "dag" INTEGER, "maand" INTEGER, "jaar" INTEGER, "volgnummer" VARCHAR(255), "schipper_voornamen" VARCHAR(255), "schipper_patroniem" VARCHAR(255), "schipper_tussenvoegsel" VARCHAR(255), "schipper_achternaam" VARCHAR(255), "schipper_plaatsnaam" VARCHAR(255), "soort_korting" VARCHAR(255), "korting_daalder" VARCHAR(255), "korting_skilling" VARCHAR(255), "subtotaal1_daalder" VARCHAR(255), "subtotaal1_skilling" VARCHAR(255), "subtotaal2_daalder" VARCHAR(255), "subtotaal2_skilling" VARCHAR(255), "totaal_rosenobel" VARCHAR(255), "totaal_daalder" VARCHAR(255), "totaal_skilling" VARCHAR(255), "andere_muntsoort1" VARCHAR(255), "andere_muntsoort1_totaal" VARCHAR(255), "andere_muntsoort2" VARCHAR(255), "andere_muntsoort2_totaal" VARCHAR(255), "privilege" VARCHAR(255), "opmerking_bron" VARCHAR(255), "opmerking_invoerder" VARCHAR(255), "opmerking_controle" VARCHAR(255), "controle_1" VARCHAR(255), "controle_2" VARCHAR(255), "controle_3" VARCHAR(255), "naam_invoerder" VARCHAR(255), "datum_opgevoerd" VARCHAR(255), "dirnummer" VARCHAR(255), "tonnage" VARCHAR(255), "belasting_korting" VARCHAR(255), "images" VARCHAR(255), "hulp" VARCHAR(255), "hulp1" VARCHAR(255), "korting_ort" VARCHAR(255), "subtotaal1_ort" VARCHAR(255), "subtotaal2_ort" VARCHAR(255), "totaal_ort" VARCHAR(255), "hulp2" VARCHAR(255) ); COPY doorvaart FROM '/home/arne/STRM/data/passages_Year_=_1754.csv' DELIMITER ';' CSV HEADER QUOTE '"' ENCODING 'latin1' NULL '';
Taxes
Import all data
-- -- Table structure for table 'belastingen' -- DROP TABLE IF EXISTS "belasting"; CREATE TABLE "belasting" ( "id_doorvaart" INTEGER NOT NULL, "regel" INTEGER NOT NULL, "naam" VARCHAR(255), "muntsoort" VARCHAR(255), "hoofdbedrag" VARCHAR(255), "subbedrag" VARCHAR(255), "korting" VARCHAR(255), "hulp" VARCHAR(255), PRIMARY KEY (id_doorvaart, regel) ); COPY belasting FROM '/home/arne/STRM/data/belastingen.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';
Import selected data
-- -- Table structure for table 'belasting' -- -- "id_doorvaart";"regel";"naam";"muntsoort";"hoofdbedrag";"subbedrag";"korting";"hulp" -- "410987";"1";"Fyrpenge";"Daler";"2";"";"N";"" DROP TABLE IF EXISTS "belasting"; CREATE TABLE "belasting" ( "id_doorvaart" INTEGER NOT NULL, "regel" INTEGER NOT NULL, "naam" VARCHAR(255), "muntsoort" VARCHAR(255), "hoofdbedrag" VARCHAR(255), "subbedrag" VARCHAR(255), "korting" VARCHAR(255), "hulp" VARCHAR(255), PRIMARY KEY (id_doorvaart, regel) ); COPY belasting (id_doorvaart, regel, naam, muntsoort, hoofdbedrag, subbedrag, korting, hulp) FROM '/home/arne/STRM/data/taxes_Year_=_1754.csv' DELIMITER ';' CSV HEADER QUOTE '"' ENCODING 'latin1' NULL '';
Cargoes
Cargoes is imported into a table lading. In order to achieve higher performance on the later coding of data three indexes are made.
Importing all data
-- -- Table structure for table 'ladingen' -- DROP TABLE IF EXISTS "lading"; CREATE TABLE "lading" ( "id_doorvaart" INTEGER NOT NULL, "regel" INTEGER NOT NULL, "van" VARCHAR(255), "naar" VARCHAR(255), "maat" VARCHAR(255), "aantal" VARCHAR(255), "soort" VARCHAR(255), "daalder" VARCHAR(255), "skilling" VARCHAR(255), "maat_alt" VARCHAR(255), "aantal_alt" VARCHAR(255), "ort" VARCHAR(255), "hulp" VARCHAR(255), PRIMARY KEY (id_doorvaart, regel) ); COPY lading FROM '/home/arne/STRM/data/ladingen.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';
Importing a selected subset
-- -- Table structure for table 'lading' -- -- "id_doorvaart";"regel";"van";"naar";"maat";"aantal";"soort";"daalder";"skilling";"maat_alt";"aantal_alt";"ort";"hulp" -- "410987";"1";"Amst.";"Narva";"";"";"Ballast";"";"";"";"";"";"" DROP TABLE IF EXISTS "lading"; CREATE TABLE "lading" ( "id_doorvaart" INTEGER NOT NULL, "regel" INTEGER NOT NULL, "van" VARCHAR(255), "naar" VARCHAR(255), "maat" VARCHAR(255), "aantal" VARCHAR(255), "soort" VARCHAR(255), "daalder" VARCHAR(255), "skilling" VARCHAR(255), "maat_alt" VARCHAR(255), "aantal_alt" VARCHAR(255), "ort" VARCHAR(255), "hulp" VARCHAR(255), PRIMARY KEY (id_doorvaart, regel) ); drop index ix_soort, ix_van, ix_naar; create index ix_soort on lading (soort); create index ix_van on lading (van); create index ix_naar on lading (naar); COPY lading FROM '/home/arne/STRM/data/cargoes_Year_=_1754.csv' DELIMITER ';' CSV HEADER QUOTE '"' ENCODING 'latin1' NULL '';
Loading aux DATA
In order to manage the automatic coding and converting of data a set of auxillary tables have been created. These tables could be called inverted tables, since each table consist of the unique field values for selected field, is frequency. To each of these tables standardised or coded fields will be added. Later in the coding process the uniques spelling of places, commodities, measurements etc. will be replaced by its code. This type of coding is context-insensitive, i.e. it does not take the context of the data into consideration. In a context-sensitive coding the context of the data is important, e.g. the same measurement expressed in words has a different metric value if the ship departed from Bergen or from Riga. Many measurements have this property, a scottish pound was different from a London pound. Measurements also changed over time, the length of an inch was different not only in place but changed also over time Context-insensitive coding/conversion is done purely on the meaning of the word itself and does not take time or space into consideration. The time-space conversion is done on a later stage in order to improve and make the context-sensitive coding.
Standardised place names
The standardisation of placenames have been done by George Welling (RUG) and X Y from Tresoar archive. The file all_places_standard.csv is loaded into a table named stdplaces.
Please note that in stdplaces is using a 8 letter code, the first three letters is numeric, the three alfa (A-Z) and some places are coded with addtional 2 digits. The two digits are used if the place name consist of two different places, e.g. the ship was intended to go to Spain however it planned to go to Norway to do extra loading before sailing to Spain. This way of coding the two destion ports are not recommended, rather the two destionations should have been coded into two 6 letter codes in order to have a strict coding system. In order to achive this following procedure is done:
- Two six letter plcodes have been created
- If the placename consist of two placenames, these are automatically sepearated
-- Create table stdplaces -- "schipper","Aace","000000","Unknown","Unindentifiable Ports","Unknown",,"1" -- DROP TABLE IF EXISTS "stdplaces"; CREATE TABLE "stdplaces" ( plctype varchar(8), plcname varchar(255), plccod8 varchar(8), stdname varchar(255), region varchar(255), bigregion varchar(255), bfield boolean, onefield varchar(255) ); CREATE INDEX ix_pclname ON stdplaces (lower(plcname)); CREATE INDEX ix_stdname ON stdplaces (lower(stdname)); COPY stdplaces (plctype, plcname, plccod8, stdname, region, bigregion, bfield, onefield) FROM '/home/arne/STRM/data/all_places_standard.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL ''; ALTER TABLE stdplaces DROP COLUMN bfield; ALTER TABLE stdplaces DROP COLUMN onefield; ALTER TABLE stdplaces ADD COLUMN plccod varchar(6); ALTER TABLE stdplaces ADD COLUMN plcco2 varchar(6); ALTER TABLE stdplaces ADD COLUMN place varchar(255); ALTER TABLE stdplaces ADD COLUMN place2 varchar(255); UPDATE stdplaces SET plccod = substring(plccod8, 1, 6);
Port of departure
Port of destination
Classification systems by Nina Bang
- Nina E. Bangs classification of commodities converted into a database table.
- Nina E. Bangs coding of ports (departure, port of intended destination, shipmaster home port)
- Converted into a database table. Download as XLS. Download as DBF.
- Download the list of ports as geocoded GIS file (ArcGIS). Download as DBF