STRM/STRC: Forskjell mellom sideversjoner

Fra hbrgeo
 
(40 mellomliggende revisjoner av samme bruker vises ikke)
Linje 3: Linje 3:
==Standardisation in progress==
==Standardisation in progress==


University of Groningen and the TRESOAR archive have standardised ports of departure, destination and shipmasters home port. By end of 2015 about 50.000 spelling variations of port names have been coded into a 6 letter alfanumeric code.
The standardisation, coding, classification and conversion (coding in short) of the STR data is a work in progress (and in its infancy). The procedures below codes a larger part of the STR data and make a 5% sample available for crosstabulation.
 
The process starts with loading the raw data into a database. Then secondly a large number of auxilliary tables are loaded and manipulated using queries, functions, semi-coded data etc.
 
Thirdly the data itself is coded which creates a paralell set of coded raw data. Since the raw data consist of 6-7 million records a subset is created and made available at the internet for viewing and cross-tabulation


==Standardisation using SQL==
==Standardisation using SQL==
The raw data has been coded using Postgresql version 9.4 running on Ubuntu 14.04, while the online verions of the coded data has been imported on a different version of linux running postgres 9.1.


===Loading main DATA===
===Loading main DATA===
Please note that the following documentation of the import uses absolute file names. When using these routines the file location must be changed according to each users directory structure and where the data is downloaded.
====Passages====
====Passages====
If passages is downloaded from STRO using the query interface a CSV-file called passages_Year_=_YYYY.csv (e.g. YYYY=1754) will be crated. The file consist of the selected passsages, first line includes field names, each field  is seperated with comma and values are qouted using dobble-qoute ". Exported data is encoded in "Latin1". By mailing the STRO-group it is also possible to get larger chucks of data, however it is encoded in UTF-8, lacks field names and each field are seperated by semi-colon. Therefore two import routines have been made, one for each of the supplied data format.
If passages is downloaded from STRO using the query interface a CSV-file called passages_Year_=_YYYY.csv (e.g. YYYY=1754) will be created. The file consist of the selected passsages, first line includes field names, each field  is seperated with comma and values are qouted using dobble-qoute ". Exported data is encoded in "Latin1". By mailing the STRO-group it is also possible to get larger chucks of data, however it is encoded in UTF-8, lacks field names and each field are seperated by semi-colon. The field name of all data is doorvaarten.csv. Therefore two import routines have been made, one for each of the supplied data format. Please note that the table definition are the same.


=====Import the complete data of passages=====
Passages are imported into a table named doorvaart, primary key is id_doorvaart.


<pre class="code">
<pre class="code">
-- Table structure for table 'doorvaart'
-- Table structure for table 'doorvaart'
--
--
-- "id_doorvaart";"dag";"maand";"jaar";"volgnummer";"schipper_voornamen";"schipper_patroniem";"schipper_tussenvoegsel";"schipper_achternaam";"schipper_plaatsnaam";"soort_korting";"korting_daalder";"korting_skilling";"subtotaal1_daalder";"subtotaal1_skilling";"subtotaal2_daalder";"subtotaal2_skilling";"totaal_rosenobel";"totaal_daalder";"totaal_skilling";"andere_muntsoort1";"andere_muntsoort1_totaal";"andere_muntsoort2";"andere_muntsoort2_totaal";"privilege";"opmerking_bron";"opmerking_invoerder";"opmerking_controle";"controle_1";"controle_2";"controle_3";"naam_invoerder";"datum_opgevoerd";"dirnummer";"tonnage";"belasting_korting";"images";"hulp";"hulp1";"korting_ort";"subtotaal1_ort";"subtotaal2_ort";"totaal_ort";"hulp2"
--  
-- "410987";"29";"6";"1754";"373";"Gerrit";"";"";"Hendrichs";"Workum";"";"";"";"";"";"";"";"";"2";"";"";"";"";"";"";"";"";"";"J";"N";"N";"ERICB";"31-03-10";"226";"";"";"";"";"";"";"";"";"";""
 
DROP TABLE IF EXISTS "doorvaart";
DROP TABLE IF EXISTS "doorvaart";


Linje 46: Linje 53:
   "andere_muntsoort2_totaal"  VARCHAR(255),
   "andere_muntsoort2_totaal"  VARCHAR(255),
   "privilege"  VARCHAR(255),
   "privilege"  VARCHAR(255),
   "opmerking_bron"  VARCHAR(255),
   "opmerking_bron"  VARCHAR(500),
   "opmerking_invoerder"  VARCHAR(255),
   "opmerking_invoerder"  VARCHAR(255),
   "opmerking_controle"  VARCHAR(255),
   "opmerking_controle"  VARCHAR(255),
Linje 66: Linje 73:
   "hulp2" VARCHAR(255)
   "hulp2" VARCHAR(255)
);
);
COPY doorvaart FROM '/home/arne/STRM/data/doorvaarten.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';
</pre>
=====Import selected and downloaded passages=====
The following code in Postgresql (SQL of the DBMS Postgres) imports the downloaded data into a table (relation) called doorvart. The complete code for importing all three tables, passages, cargoes and taxes can be downloaded here.
<pre class="code">
-- Table structure for table 'doorvaart'
--
--  "id_doorvaart";"dag";"maand";"jaar";"volgnummer";"schipper_voornamen";"schipper_patroniem";"schipper_tussenvoegsel";"schipper_achternaam";"schipper_plaatsnaam";"soort_korting";"korting_daalder";"korting_skilling";"subtotaal1_daalder";"subtotaal1_skilling";"subtotaal2_daalder";"subtotaal2_skilling";"totaal_rosenobel";"totaal_daalder";"totaal_skilling";"andere_muntsoort1";"andere_muntsoort1_totaal";"andere_muntsoort2";"andere_muntsoort2_totaal";"privilege";"opmerking_bron";"opmerking_invoerder";"opmerking_controle";"controle_1";"controle_2";"controle_3";"naam_invoerder";"datum_opgevoerd";"dirnummer";"tonnage";"belasting_korting";"images";"hulp";"hulp1";"korting_ort";"subtotaal1_ort";"subtotaal2_ort";"totaal_ort";"hulp2"
-- "410987";"29";"6";"1754";"373";"Gerrit";"";"";"Hendrichs";"Workum";"";"";"";"";"";"";"";"";"2";"";"";"";"";"";"";"";"";"";"J";"N";"N";"ERICB";"31-03-10";"226";"";"";"";"";"";"";"";"";"";""
</pre>
Create table as above, then
<pre class="code">


COPY doorvaart FROM '/home/arne/STRM/data/passages_Year_=_1754.csv' DELIMITER ';' CSV HEADER QUOTE '"' ENCODING 'latin1' NULL '';
COPY doorvaart FROM '/home/arne/STRM/data/passages_Year_=_1754.csv' DELIMITER ';' CSV HEADER QUOTE '"' ENCODING 'latin1' NULL '';
Linje 72: Linje 95:


====Taxes====
====Taxes====
=====Import all data=====
<pre class="code">
--
-- Table structure for table 'belastingen'
--
DROP TABLE IF EXISTS "belasting";
CREATE TABLE "belasting" (
  "id_doorvaart" INTEGER NOT NULL,
  "regel" INTEGER NOT NULL,
  "naam" VARCHAR(255),
  "muntsoort" VARCHAR(255),
  "hoofdbedrag" VARCHAR(255),
  "subbedrag" VARCHAR(255),
  "korting" VARCHAR(255),
  "hulp" VARCHAR(255),
  PRIMARY KEY (id_doorvaart, regel)
);
COPY belasting FROM '/home/arne/STRM/data/belastingen.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';
</pre>
=====Import selected data=====
<pre class="code">
--
-- Table structure for table 'belasting'
--
-- "id_doorvaart";"regel";"naam";"muntsoort";"hoofdbedrag";"subbedrag";"korting";"hulp"
-- "410987";"1";"Fyrpenge";"Daler";"2";"";"N";""
DROP TABLE IF EXISTS "belasting";
CREATE TABLE "belasting" (
  "id_doorvaart" INTEGER NOT NULL,
  "regel" INTEGER NOT NULL,
  "naam" VARCHAR(255),
  "muntsoort" VARCHAR(255),
  "hoofdbedrag" VARCHAR(255),
  "subbedrag" VARCHAR(255),
  "korting" VARCHAR(255),
  "hulp" VARCHAR(255),
  PRIMARY KEY (id_doorvaart, regel)
);
COPY belasting (id_doorvaart, regel, naam, muntsoort, hoofdbedrag, subbedrag, korting, hulp) FROM '/home/arne/STRM/data/taxes_Year_=_1754.csv' DELIMITER ';' CSV HEADER QUOTE '"' ENCODING 'latin1' NULL '';
</pre>
====Cargoes====
====Cargoes====
Cargoes is imported into a  table lading. In order to achieve higher performance on the later coding of data three indexes are made.
=====Importing all data=====
<pre class="code">
--
-- Table structure for table 'ladingen'
--
DROP TABLE IF EXISTS "lading";
CREATE TABLE "lading" (
  "id_doorvaart" INTEGER NOT NULL,
  "regel" INTEGER NOT NULL,
  "van" VARCHAR(255),
  "naar" VARCHAR(255),
  "maat" VARCHAR(255),
  "aantal" VARCHAR(255),
  "soort" VARCHAR(255),
  "daalder" VARCHAR(255),
  "skilling" VARCHAR(255),
  "maat_alt" VARCHAR(255),
  "aantal_alt" VARCHAR(255),
  "ort" VARCHAR(255),
  "hulp" VARCHAR(255),
  PRIMARY KEY (id_doorvaart, regel)
);
COPY lading FROM '/home/arne/STRM/data/ladingen.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';
</pre>
=====Importing a selected subset=====
<pre class="code">
--
-- Table structure for table 'lading'
--
-- "id_doorvaart";"regel";"van";"naar";"maat";"aantal";"soort";"daalder";"skilling";"maat_alt";"aantal_alt";"ort";"hulp"
-- "410987";"1";"Amst.";"Narva";"";"";"Ballast";"";"";"";"";"";""
</pre>
Create table as above, then import
<pre class="code">
COPY lading FROM '/home/arne/STRM/data/cargoes_Year_=_1754.csv' DELIMITER ';' CSV HEADER QUOTE '"' ENCODING 'latin1' NULL '';
</pre>
===Loading aux DATA===
===Loading aux DATA===
In order to manage
In order to manage the automatic coding and converting of data a set of auxillary tables have been created. These tables could be called inverted tables, since each table consist of the unique field values for selected field, is frequency. To each of these tables standardised or coded fields will be added. Later in the coding process the uniques spelling of places, commodities, measurements etc. will be replaced by its code.  This type of coding is context-insensitive, i.e. it does not take the context of the data into consideration. In a context-sensitive coding the context of the data is important, e.g. the same  measurement expressed in words has a different metric value if the ship departed from Bergen or from Riga. Many measurements have this property, a scottish pound was different from a London pound. Measurements also changed over time, the length of an inch was different not only in place but changed also over time Context-insensitive coding/conversion is done purely on the meaning of the word itself and does not take time or space into consideration. The time-space conversion is done on a later stage in order to improve and make the context-sensitive coding.
 
====Standardised place names====
====Standardised place names====
The standardisation of placenames have been done by George Welling (RUG) and X Y from Tresoar archive. The file all_places_standard.csv is loaded into a table named stdplaces.
University of Groningen and the TRESOAR archive have standardised ports of departure, destination and shipmasters home port. By end of 2015 about 50.000 spelling variations of port names have been coded into a 6 letter alfanumeric code.
Please note that in stdplaces is using a 8 letter code, the first three letters is numeric, the three alfa (A-Z) and some places are coded with addtional 2 digits. The two digits are used if the place name consist of two different places, e.g. the ship was intended to go to Spain however it planned to go to Norway to do extra loading before sailing to Spain. This way of coding the two destion ports are not recommended, rather the two destionations should have been coded into two 6 letter codes in order to have a strict  coding system.  In order to achive this following procedure is done:
*Two six letter plcodes have been created
*If the placename consist of two placenames, these are automatically sepearated
Stdplaces is not part of the coded database. To get the data please contact STRO
<pre class="code">
-- Create table stdplaces
-- "schipper","Aace","000000","Unknown","Unindentifiable Ports","Unknown",,"1"
--
DROP TABLE IF EXISTS "stdplaces";
CREATE TABLE "stdplaces" (
    plctype varchar(8),
    plcname varchar(255),
    plccod8 varchar(8),
    stdname varchar(255),
    region varchar(255),
    bigregion varchar(255),
    bfield boolean,
    onefield varchar(255)
);
CREATE INDEX ix_pclname ON stdplaces (lower(plcname));
CREATE INDEX ix_stdname ON stdplaces (lower(stdname));
COPY stdplaces (plctype, plcname, plccod8, stdname, region, bigregion, bfield, onefield) FROM '/home/arne/STRM/data/all_places_standard.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';
 
ALTER TABLE stdplaces DROP COLUMN bfield;
ALTER TABLE stdplaces DROP COLUMN onefield;
ALTER TABLE stdplaces ADD COLUMN plccod varchar(6);
ALTER TABLE stdplaces ADD COLUMN plcco2 varchar(6);
ALTER TABLE stdplaces ADD COLUMN place varchar(255);
ALTER TABLE stdplaces ADD COLUMN place2 varchar(255);
UPDATE stdplaces SET plccod = substring(plccod8, 1, 6);
</pre>
====Port of departure====
====Port of departure====
A sample of the data can be [http://solli.h.uib.no/showtab.php?tabname=portdept viewed here]
<pre class="code">
--
-- Port of departure
--
DROP TABLE IF EXISTS "portdept";
CREATE TABLE "portdept" (
  id serial not null,
  name varchar(255),
  freq integer,
    plccod char(6),
    primsel integer,
    frgnsel integer,
  PRIMARY KEY (name)
);
-- Using header to skip first line which contains null value
COPY portdept (name, freq) FROM '/home/arne/STRM/data/ports_departure.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';
</pre>
====Port of destination====
====Port of destination====
All unique spelling variations of name of port of destination and the frequency of the term.
A sample of the data can be [http://solli.h.uib.no/showtab.php?tabname=portdest viewed here]
<pre class="code">
--
-- Port of destination
--
DROP TABLE IF EXISTS "portdest";
CREATE TABLE "portdest" (
  id serial not null,
  name varchar(255),
  freq integer,
    plccod varchar(6),
    frgnsel integer,
    primsel integer,
  PRIMARY KEY (name)
);
-- Using header to skip first line which contains null value
COPY portdest (name, freq) FROM '/home/arne/STRM/data/ports_destination.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';
</pre>
====Ship masters home port====
All unique spelling variations of name of ship masters home port and the frequency of the term
A sample of the data can be [http://solli.h.uib.no/showtab.php?tabname=smhport viewed here]
<pre class="code">
--
-- Ship masters home Port
--
DROP TABLE IF EXISTS "smhport";
CREATE TABLE "smhport" (
  id serial not null,
  name varchar(255),
  freq integer,
    plccod char(6),
    frgnsel integer,
    primsel integer,
  PRIMARY KEY (name)
);
-- Using header to skip first line which contains null value
COPY smhport (name, freq) FROM '/home/arne/STRM/data/shipmaster_home_ports.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';
</pre>
====Classifcation or Subsets created from stdplaces====
=====Unique ports=====
A table of unique ports are created from all_places_standardised. The six alfanumeric code is used as identification of a unique port. The table uniqport is used during coding of geograpichal codes and classification of regions, current country etc. A select-query creates the table uniqports and afterwards new columns are added and inititated.
A sample of the data can be [http://solli.h.uib.no/showtab.php?tabname=uniqport viewed here]
<pre class="code">
--
-- Create table uniqport
--
DROP TABLE IF EXISTS "uniqport"  CASCADE;
SELECT substring(plccod8,1, 6) AS plccod, min(stdname) AS stdname, min(region) as region, min(bigregion) as bigregion INTO uniqport FROM stdplaces where (stdname is NOT NULL) GROUP BY substring(plccod8,1, 6);
ALTER TABLE uniqport ADD COLUMN currcntry varchar(25);
ALTER TABLE uniqport ADD COLUMN eastwest char(2);
ALTER TABLE uniqport ADD COLUMN primsel integer;
ALTER TABLE uniqport ADD PRIMARY KEY(plccod);
UPDATE uniqport set currcntry = region where ((region='Norway') or (region='Scotland') or (region='Italy') or (region='Ireland') or (region='Denmark'));
UPDATE uniqport set currcntry = region where ((region='Spain') or (region='Esthonia') or (region='Iceland') or (region='Portugal') or (region='The Netherlands'));
UPDATE uniqport set currcntry = 'France' where (region LIKE 'France%');
UPDATE uniqport set currcntry = 'Germany' where (region LIKE 'Hamburg%');
UPDATE uniqport set primsel=0;
UPDATE portdept SET plccod =  stdplaces.plccod FROM stdplaces WHERE stdplaces.plcname=portdept.name;
UPDATE portdest SET plccod =  stdplaces.plccod FROM stdplaces WHERE stdplaces.plcname=portdest.name;
UPDATE smhport SET plccod =  stdplaces.plccod FROM stdplaces WHERE stdplaces.plcname=smhport.name;
</pre>
=====Region=====
A sample of the data can be [http://solli.h.uib.no/showtab.php?tabname=region viewed here]
<pre class="code">
--
-- Create table region
--
DROP TABLE IF EXISTS "region";
SELECT region as name, count(region) as freq, min(bigregion) as bigregion INTO region FROM stdplaces where (region is NOT NULL) GROUP BY region;
ALTER TABLE region ADD COLUMN currcntry varchar(25);
ALTER TABLE region ADD COLUMN eastwest char(1);
ALTER TABLE region ADD PRIMARY KEY(name);
UPDATE region set eastwest = bigregion.eastwest FROM bigregion where bigregion.name=region.bigregion;
UPDATE region set eastwest = 'W' where ((name like 'The Neth%') or (name='Norway') or (name like '%Medi%') or (name like 'Engl%')  or (name like '%Atla%'));
UPDATE region set eastwest = 'E' where (name like '%Baltic%');
UPDATE uniqport set eastwest = region.eastwest from region where uniqport.region = region.name;
</pre>
=====Big regions=====
A sample of the data can be [http://solli.h.uib.no/showtab.php?tabname=bigregion viewed here]
<pre class="code">
--
-- Create table bigregion
--
DROP TABLE IF EXISTS "bigregion";
SELECT bigregion as name, count(bigregion) as freq, min(region) as minregion, max(region) as maxregion INTO bigregion FROM stdplaces where (bigregion is NOT NULL) GROUP BY bigregion;
ALTER TABLE bigregion ADD COLUMN eastwest char(1);
ALTER TABLE bigregion ADD PRIMARY KEY(name);
UPDATE bigregion set eastwest = 'W' where ((name like 'Dutc%') or (name like 'Nor%') or (name like '%Medi%') or (name like 'Great%') or (name like '%Amer%') or (name like '%Atla%') or (name like 'Ice%') or (name like 'Arc%') or (name like '%Far East%') or (name like 'West%'));
UPDATE bigregion set eastwest = 'E' where (name like '%Baltic%');
</pre>
===Creating functions===
Advanced text comparsions functions is nessesary since the data consist of many spelling variations of the same word. In order to compare places, commodties, measurements, classes etc the jaro-winkler function for word or string similarity has been added to the database. This can be downloaded from [https://github.com/eulerto/pg_similarity github] and loaded into the database simply by the statement
<pre code="class">
\i pg_similarity.sql
</pre>
In order to convert fractions to decimal numbers, and to split combined fields like placenames or commodities three general functions have been created
splitog (s, splitstring, fieldno) as text: Takes a input, e.g. destination port "København og Riga" or commodities "Silke og bommuld" and split the string into two. Input is the text to be split, second the split
argument and thirdly which part to return, first or second
<pre class="code">
create or replace function splitog(varchar,varchar,integer) returns text as $$
declare
p integer;
X text;
sub text;
nret integer;
lsub integer;
begin
  X := trim(both $1::text);
  sub:= $2::text;
  nret := $3::integer;
  lsub := length(sub);
  if ((X='') or (X='-')) then
    return null;
  end if;
  p := 0;
  p := strpos(X, sub);
  if (p>1) then
    if nret=1 then
      X := trim(both substr(X,1,p));
    else
      X := trim(both substr(X, p+lsub,length(X)));
    end if;
  end if;
  return X::text;
EXCEPTION
  WHEN invalid_text_representation then
      return null;
end;
$$ language plpgsql immutable strict;
</pre>
frac2num (s text) as double precision: Takes a string input, e.g. '2 3/4' and converts the text into 2.75.
<pre class="code">
create or replace function frac2num(varchar) returns float8 as $$
declare
p integer;
n float8;
num float8;
den float8;
X text;
begin
  X := trim(both $1::text);
  if ((X='') or (X='-')) then
    return null;
  end if;
  p := 0;
  n := 0;
  p := strpos(X, '/');
  if (p=0) then
    p:=strpos(X, ' ');
    if p>0 then
      n := (left(X,p-1))::float8;
    else
      n:=X::float8;
    end if;
  else
    den:=(substr(X,p+1))::float8;
    if (p<2) and not (check_numeric(X)) then
      return null;
    end if;
    X := trim(left(X, p-1));
    p := strpos(X,' ');
    if p=0 then
      num := x::float8;
    else
      num := (substr(X, p+1, length(X)))::float8;
      n := (left(X,p-1))::float8;
    end if;
  end if;
  if den<>0 then
    n := n + (num / den);
  end if;
  return n::float8;
EXCEPTION
  WHEN invalid_text_representation then
      return null;
end;
$$ language plpgsql immutable strict;
</pre>
check_numeric (s as varchar) as boolean: Takes a string text as input and check if it is a legal number.
<pre class="code">
--
create or replace function check_numeric(varchar) returns bool as $$
declare
i numeric;
begin
i := $1::numeric;
return 't'::bool;
EXCEPTION WHEN invalid_text_representation then
return 'f'::bool;
end;
$$ language plpgsql immutable strict;
</pre>
===Modifying geographical data
<pre class="code">
UPDATE stdplaces SET plccod = substring(plccod8, 1, 6);
UPDATE stdplaces set place = splitog(plcname,' og ',1) where plcname like '% og %';
UPDATE stdplaces set place2 = splitog(plcname,' og ',2) where plcname like '% og %';
</pre>
===Loading coding data for commodoties===
A sample of the data can be [http://solli.h.uib.no/showtab.php?tabname=commodities viewed here]
<pre class="code">
--
-- Commodities
--
DROP TABLE IF EXISTS "commodities";
CREATE TABLE "commodities" (
  id serial not null,
  soortname varchar(110),
  freq integer,
    primsel integer,
    frgnsel integer,
    nbclass integer,
    nbcode char(7),
    ngcode integer,
    sitcr4 char(8),
        jwval double precision,
        prod1 varchar(110),
        prod2 varchar(110),
  PRIMARY KEY (soortname)
);
-- Using header to skip first line which contains null value
COPY commodities (soortname, freq) FROM '/home/arne/STRM/data/commodities.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';
UPDATE commodities set prod1 = splitog(soortname,' og ',1) where (soortname like '% og %');
UPDATE commodities set prod2 = splitog(soortname,' og ',2) where (soortname like '% og %');
UPDATE commodities set prod1 = splitog(soortname,' a ',1) where ((soortname like '% a %') and not (soortname like '% og %'));
UPDATE commodities set prod2 = splitog(soortname,' a ',2) where ((soortname like '% a %') and not (soortname like '% og %'));
--
-- Units for Commodities
--
</pre>
Units for commodities. A sample of the data can be [http://solli.h.uib.no/showtab.php?tabname=communit viewed here]
<pre class="code">
DROP TABLE IF EXISTS "communit";
CREATE TABLE "communit" (
  id serial not null,
  maatname varchar(255),
  freq integer,
        stdmname varchar(100),
    primsel integer,
    frgnsel integer,
    volnum double precision,
    weighnum double precision,
    maatnum double precision,
    valnum double precision,
    lennum double precision,
        jwval double precision,
  PRIMARY KEY (maatname)
);
-- Using header to skip first line which contains null value
COPY communit (maatname, freq) FROM '/home/arne/STRM/data/commodities_unit.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';
update communit set primsel=1;
update communit set jwval=0.0;
</pre>
Load standardised names and metric conversion for units of weigh, length, volume, etc. A sample of the data can be [http://solli.h.uib.no/showtab.php?tabname=metunit viewed here]
Based on Norsk Forlishistorisk Forening. Vedlikeholdes av Trond Johannessen. http://www.maritimt.net/arkforsk/danskem.htm
<pre class="code">
--
-- Load metric units
--
DROP TABLE IF EXISTS "metunit" cascade;
create table "metunit" (
  id_unit serial not null,
  betegnelse varchar(255),
  mtype varchar(10),
  altname varchar(255),
  comm1 varchar(255),
  comm2 varchar(255),
  yearfrom integer,
  yearto integer,
  fork varchar(10),
  forklaring varchar(255),
  antall float8,
  bestanddele varchar(255),
  metric float8,
  siunit char(3),
  tilsvarer float8,
  tunit varchar(20),
  primary key (id_unit)
);
CREATE INDEX ON metunit (betegnelse);
COPY metunit (betegnelse,mtype,altname,comm1,comm2,yearfrom,yearto,fork,forklaring,antall,bestanddele,metric,siunit,tilsvarer,tunit) FROM '/home/arne/STRM/data/danskem.csv' DELIMITER ';' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';
alter table metunit add column primsel integer;
UPDATE metunit set primsel=0;
UPDATE metunit set primsel=1 where ((yearfrom<1858) and (yearto>1683) and ((mtype like 'Vekt') or (mtype like 'Volum') or (mtype like 'Lengde') or (mtype like 'Antall')));
update communit set maatnum = metric, jwval=jarowinkler(betegnelse,maatname), stdmname=betegnelse from metunit where ((metunit.primsel=1) and (mtype='Antall') and (jarowinkler(betegnelse,maatname)>jwval) and jarowinkler(betegnelse,maatname)>0.90);
update communit set volnum = metric, jwval=jarowinkler(betegnelse,maatname), stdmname=betegnelse from metunit where ((metunit.primsel=1) and (mtype='Volum') and (jarowinkler(betegnelse,maatname)>jwval) and jarowinkler(betegnelse,maatname)>0.90);
update communit set weighnum = metric, jwval=jarowinkler(betegnelse,maatname), stdmname=betegnelse from metunit where ((metunit.primsel=1) and (mtype='Vekt') and (jarowinkler(betegnelse,maatname)>jwval) and jarowinkler(betegnelse,maatname)>0.90);
update communit set lennum = metric, jwval=jarowinkler(betegnelse,maatname), stdmname=betegnelse from metunit where ((metunit.primsel=1) and (mtype='Lengde') and (jarowinkler(betegnelse,maatname)>jwval) and jarowinkler(betegnelse,maatname)>0.90);
UPDATE communit set maatnum = 1.0, jwval=jarowinkler('Stk',maatname), stdmname='Stykker' where (jarowinkler('Stk',maatname)>0.90) and (jarowinkler('Stk',maatname)>jwval);
UPDATE communit set maatnum = 12.0, jwval=jarowinkler('Tylter',maatname) where (jarowinkler('Tylter',maatname)>0.90) and (jarowinkler('Tylter',maatname)>jwval);
UPDATE communit set volnum = 231.6, jwval=jarowinkler('Oksehoved',maatname) where (jarowinkler('Oksehoved',maatname)>0.90) and (jarowinkler('Oksehoved',maatname)>jwval);
UPDATE communit set volnum = 138.97, jwval=jarowinkler('Tønder',maatname) where (jarowinkler('Tønder',maatname)>0.90) and (jarowinkler('Tønder',maatname)>jwval);
UPDATE communit set weighnum = 2600, jwval=jarowinkler('Lester',maatname) where (jarowinkler('Lester',maatname)>0.85) and (jarowinkler('Lester',maatname)>jwval);
UPDATE communit set weighnum = 2600, jwval=jarowinkler('Læst',maatname) where (jarowinkler('Læst',maatname)>0.85) and (jarowinkler('Læst',maatname)>jwval);
UPDATE communit set valnum = 1.0, jwval=jarowinkler('Rd',maatname), stdmname='Rdlr' where (jarowinkler('Rd',maatname)>0.90) and (jarowinkler('Rd',maatname)>jwval);
UPDATE communit set valnum = 1.0, jwval=jarowinkler('Rigsdal',maatname), stdmname='Rdlr' where (jarowinkler('Rigsdal',maatname)>0.90) and (jarowinkler('Rigsdal',maatname)>jwval);
UPDATE communit set weighnum = 1016, jwval=jarowinkler('Tons',maatname)  where (jarowinkler('Tons',maatname)>0.90) and (jarowinkler('Tons',maatname)>jwval);
UPDATE communit set weighnum = 6.35, jwval=jarowinkler('Steen',maatname)  where (jarowinkler('Steen',maatname)>0.90) and (jarowinkler('Steen',maatname)>jwval);
</pre>
Load Nina Bangs classification system for commodities. A sample of the data can be [http://solli.h.uib.no/showtab.php?tabname=nbclass viewed here]
<pre class="code">
--
-- Create table nbclass
--
-- "eastwest";"nbclass";"nbclsname";"french";"german";"english";"nbchar"
-- "EB";"1";"Salt ";"sel";"Salz";"salt";"001"
DROP TABLE IF EXISTS "nbclass"  CASCADE;
CREATE TABLE "nbclass" (
  direction char(2),
  nbclass integer,
  nbclsname varchar(255),
  french varchar(255),
  german varchar(255),
  english varchar(255),
  nbchar char(3),
  PRIMARY KEY (nbclass)
);
CREATE INDEX ON nbclass (lower(nbclsname));
COPY nbclass (direction,nbclass,nbclsname,french, german, english, nbchar) FROM '/home/arne/STRM/data/nbclass-export.txt' DELIMITER ';' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';
</pre>
<pre class="code">
Load Nina Bangs coding system system for commodities.
A sample of the data can be [http://solli.h.uib.no/showtab.php?tabname=nbcode viewed here]
--
-- Create table nbcode
--
-- "eastweast";"nbcode";"nbvare";"nbclass";"Level3"
-- "EB";"001.001";"Salt";1,00;0,00
DROP TABLE IF EXISTS "nbcode"  CASCADE;
CREATE TABLE "nbcode" (
  direction char(2),
  nbcode char (7),
  nbvare varchar(255),
  nbclass integer,
  level3 integer,
  PRIMARY KEY (nbcode)
);
CREATE INDEX ON nbcode (lower(nbvare));
COPY nbcode (direction,nbcode,nbvare,nbclass, level3) FROM '/home/arne/STRM/data/nbcode-export.txt' DELIMITER ';' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';
Alter table nbcode add column altname varchar(255);
update nbcode set altname = replace(nbvare,'æ','e') where nbvare like ('%æ%');
update nbcode set altname = replace(nbvare,'Æ','E') where nbvare like ('%Æ%');
</pre>
Update commodites coding table according to Load Nina Bangs coding and classification system using
<pre class="code">
UPDATE commodities set nbclass = nbclass.nbclass FROM nbclass where (commodities.soortname = nbclass.nbclsname);
update commodities set jwval=0.0;
UPDATE commodities set nbcode = nbcode.nbcode, jwval=1.0, nbclass=nbcode.nbclass FROM nbcode where (commodities.soortname = nbcode.nbvare);
-- Update ballast
--
update commodities set nbclass = -1, jwval=jarowinkler(commodities.soortname,'Ballast'), nbcode='-1', ngcode=-1 where ((jarowinkler(commodities.soortname,'Ballast')>jwval) and (jarowinkler(commodities.soortname,'Ballast')>0.93) and (commodities.soortname like '%Ballast%'));
-- Update on commodity name
--
UPDATE commodities set nbcode = nbcode.nbcode, jwval=jarowinkler(commodities.soortname,nbcode.nbvare), nbclass=nbcode.nbclass FROM nbcode where ((commodities.freq >0) and (jarowinkler(commodities.soortname,nbcode.nbvare)>jwval) and (jarowinkler(commodities.soortname,nbcode.nbvare)>0.89));
-- Update on commodity alternate name, æ/e problem
--
UPDATE commodities set nbcode = nbcode.nbcode, jwval=jarowinkler(commodities.soortname,nbcode.altname), nbclass=nbcode.nbclass FROM nbcode where ((commodities.freq >0) and (jarowinkler(commodities.soortname,nbcode.altname)>jwval) and (jarowinkler(commodities.soortname,nbcode.altname)>0.89) and not (altname=''));
-- Update on split of commodity , first product, " og " and " a " problem
--
UPDATE commodities set nbcode = nbcode.nbcode, jwval=jarowinkler(commodities.prod1,nbcode.nbvare), nbclass=nbcode.nbclass FROM nbcode where ((commodities.freq >0) and (jarowinkler(commodities.prod1,nbcode.nbvare)>jwval) and (jarowinkler(commodities.prod1,nbcode.nbvare)>0.89));
</pre>
===Coding raw data===
<pre class="code">
--
-- Table structure for table 'cdoorvaart'
--
DROP TABLE IF EXISTS "cdoorvaart";
CREATE TABLE "cdoorvaart" (
  "id_doorvaart" INTEGER PRIMARY KEY NOT NULL,
  "dag"  INTEGER,
  "maand"  INTEGER,
  "jaar"  INTEGER,
  "volgnummer"  integer,
  "smhport"  CHAR(6),
  "totaal_daalder"  double precision,
  "totaal_skilling"  double precision,
  "tottax" double  precision,
  "schipper_voornamen"  CHAR(4),
  "schipper_patroniem"  CHAR(4),
  "schipper_achternaam"  CHAR(4),
  vanfrom char(6),
  naarto char(6),
  direction char(2),
  export integer
);
INSERT into cdoorvaart (id_doorvaart, dag, maand, jaar, volgnummer) select id_doorvaart, dag, maand, jaar, volgnummer::int from doorvaart;
UPDATE cdoorvaart SET totaal_daalder =  cast(doorvaart.totaal_daalder as double precision) FROM doorvaart WHERE ((cdoorvaart.id_doorvaart=doorvaart.id_doorvaart) and (check_numeric(doorvaart.totaal_daalder)));
UPDATE cdoorvaart SET totaal_skilling =  cast(doorvaart.totaal_skilling as double precision) FROM doorvaart WHERE ((cdoorvaart.id_doorvaart=doorvaart.id_doorvaart) and (check_numeric(doorvaart.totaal_skilling)));
UPDATE cdoorvaart SET totaal_daalder = frac2num(doorvaart.totaal_daalder) from doorvaart where ((cdoorvaart.id_doorvaart=doorvaart.id_doorvaart) and (doorvaart.totaal_daalder<>'') and not (check_numeric(doorvaart.totaal_daalder)));
UPDATE cdoorvaart SET totaal_skilling =  frac2num(doorvaart.totaal_skilling) FROM doorvaart WHERE ((cdoorvaart.id_doorvaart=doorvaart.id_doorvaart) and (doorvaart.totaal_skilling<>'') and not (check_numeric(doorvaart.totaal_skilling)));
UPDATE cdoorvaart set tottax = totaal_daalder + (totaal_skilling/96);
UPDATE cdoorvaart set smhport = smhport.plccod FROM smhport, doorvaart WHERE ((doorvaart.schipper_plaatsnaam = smhport.name) and (doorvaart.id_doorvaart=cdoorvaart.id_doorvaart));
--
-- Table structure for table 'clading'
--
DROP TABLE IF EXISTS "clading" CASCADE;
CREATE TABLE "clading" (
  "id_doorvaart" INTEGER NOT NULL,
  "regel" INTEGER NOT NULL,
  "vanfrom" CHAR(6),
  "naarto" CHAR(6),
  "maat" double precision,
  "aantal" double precision,
  "ebsoort" CHAR(7),
  "wbsoort" CHAR(7),
  "daalder" double precision,
  "skilling" double precision,
  "rddeci" double precision,
  "metmeasure" double precision,
  siunit CHAR(3);
  year integer,
  direction char(2),
  export integer,
  PRIMARY KEY (id_doorvaart, regel)
);
INSERT into clading (id_doorvaart, regel) select id_doorvaart, regel from lading;
UPDATE clading SET daalder =  cast(lading.daalder as double precision) FROM lading WHERE ((clading.id_doorvaart=lading.id_doorvaart) and (clading.regel=lading.regel) and (check_numeric(lading.daalder)));
UPDATE clading SET daalder =  frac2num(lading.daalder) FROM lading WHERE ((clading.id_doorvaart=lading.id_doorvaart) and (clading.regel=lading.regel)  and (lading.daalder<>'') and not (check_numeric(lading.daalder)));
UPDATE clading SET skilling =  cast(lading.skilling as double precision) FROM lading WHERE ((clading.id_doorvaart=lading.id_doorvaart)and (clading.regel=lading.regel) and (check_numeric(lading.skilling)));
UPDATE clading SET skilling =  frac2num(lading.skilling) FROM lading WHERE ((clading.id_doorvaart=lading.id_doorvaart)and (clading.regel=lading.regel)  and (lading.skilling<>'') and not (check_numeric(lading.skilling)));
UPDATE clading set rddeci = daalder + (skilling/96);
update clading set rddeci = daalder where skilling is null;
UPDATE clading set rddeci = (skilling/96) where daalder is null;
UPDATE clading SET aantal =  cast(lading.aantal as double precision) FROM lading WHERE ((clading.id_doorvaart=lading.id_doorvaart) and (lading.regel=clading.regel) and (check_numeric(lading.aantal)));
UPDATE clading SET aantal =  frac2num(lading.aantal) FROM lading WHERE ((clading.id_doorvaart=lading.id_doorvaart) and (lading.regel=clading.regel) and (lading.aantal<>'') and not (check_numeric(lading.aantal)));
UPDATE clading set vanfrom = portdept.plccod FROM portdept, lading WHERE ((lading.id_doorvaart=clading.id_doorvaart) and (lading.regel=clading.regel) and (lading.van = portdept.name));
UPDATE clading set naarto = portdest.plccod FROM portdest, lading WHERE ((lading.id_doorvaart=clading.id_doorvaart) and (lading.regel=clading.regel) and (lading.naar = portdest.name));
UPDATE clading set direction = concat(substr(uniqport.eastwest,1,1), 'B') FROM uniqport WHERE (clading.naarto=uniqport.plccod);
UPDATE clading set maat = communit.maatnum, siunit='stk' FROM communit, lading WHERE ((lading.id_doorvaart=clading.id_doorvaart) and (lading.regel=clading.regel) and (lading.maat = communit.maatname) and (communit.maatnum>0));
UPDATE clading set maat = communit.volnum, siunit='L' FROM communit, lading WHERE ((lading.id_doorvaart=clading.id_doorvaart) and (lading.regel=clading.regel) and (lading.maat = communit.maatname) and (communit.volnum>0));
UPDATE clading set maat = communit.weighnum, siunit='kg' FROM communit, lading WHERE ((lading.id_doorvaart=clading.id_doorvaart) and (lading.regel=clading.regel) and (lading.maat = communit.maatname) and (communit.weighnum>0));
UPDATE clading set maat = communit.lennum, siunit='m' FROM communit, lading WHERE ((lading.id_doorvaart=clading.id_doorvaart) and (lading.regel=clading.regel) and (lading.maat = communit.maatname) and (communit.lennum>0));
UPDATE clading SET metmeasure = aantal * maat;
Update clading set year = jaar from cdoorvaart where (clading.id_doorvaart=cdoorvaart.id_doorvaart);
-- cdoorvaart
update cdoorvaart set vanfrom = clading.vanfrom, naarto = clading.naarto from clading where ((clading.id_doorvaart=cdoorvaart.id_doorvaart) and (clading.regel=1));
UPDATE cdoorvaart set direction = concat(substr(uniqport.eastwest,1,1), 'B') FROM uniqport WHERE (cdoorvaart.naarto = uniqport.plccod);
-- direction and Nina Bang coding
update clading set ebsoort = nbcode from commodities, lading WHERE ((lading.id_doorvaart=clading.id_doorvaart) and (lading.regel=clading.regel) and (lading.soort = commodities.soortname));
update clading set wbsoort = nbcode from commodities, lading WHERE ((lading.id_doorvaart=clading.id_doorvaart) and (lading.regel=clading.regel) and (lading.soort = commodities.soortname));
--
-- Table structure for table 'cbelasting'
--
DROP TABLE IF EXISTS "cbelasting";
CREATE TABLE "cbelasting" (
  "id_doorvaart" INTEGER NOT NULL,
  "regel" INTEGER NOT NULL,
  "naam" CHAR(2),
  "muntsoort" CHAR(2),
  "hoofdbedrag" double precision,
  "subbedrag" double precision,
  "korting" CHAR(2),
  "totbedrag" double  precision,
  export integer,
  PRIMARY KEY (id_doorvaart, regel)
);
INSERT into cbelasting (id_doorvaart, regel) select id_doorvaart, regel from belasting;
UPDATE cbelasting SET hoofdbedrag =  cast(belasting.hoofdbedrag as double precision) FROM belasting WHERE ((cbelasting.id_doorvaart=belasting.id_doorvaart) and (cbelasting.regel=belasting.regel) and (check_numeric(belasting.hoofdbedrag)));
UPDATE cbelasting SET subbedrag =  cast(belasting.subbedrag as double precision) FROM belasting WHERE ((cbelasting.id_doorvaart=belasting.id_doorvaart) and (cbelasting.regel=belasting.regel) and (check_numeric(belasting.subbedrag)));
UPDATE cbelasting set totbedrag = hoofdbedrag + subbedrag where ((hoofdbedrag is not null) and (subbedrag is not null));
UPDATE cbelasting set totbedrag = hoofdbedrag where (subbedrag is null);
UPDATE cbelasting set totbedrag = subbedrag where (hoofdbedrag is null);
</pre>
===Extraction of a sample===
This example show how every passage that have cargoes to Norway, Spain and Portugal are extracted into the sample  tables
<pre class="code">
--
-- Set export indicators
--
update cbelasting set export=0;
update cdoorvaart set export=0;
update clading set export=0;
update uniqport set primsel=0;
update uniqport set primsel=1 where ((uniqport.region='Norway') or (uniqport.region='Spain') or (uniqport.region='Portugal'));
update clading set export=1 from uniqport where ((clading.naarto=uniqport.plccod) and (uniqport.primsel=1));
update cdoorvaart set export=1 from clading where ((cdoorvaart.id_doorvaart=clading.id_doorvaart) and (clading.export=1));
update cbelasting set export=1 from cdoorvaart where ((cbelasting.id_doorvaart=cdoorvaart.id_doorvaart) and (cdoorvaart.export=1));
--
-- Make export tables
-- 
drop table IF EXISTS ctax cascade;
drop table IF EXISTS cpassage cascade;
drop table IF EXISTS ccargo cascade;
create table ctax as select * from cbelasting where export=1;
create table cpassage as select * from cdoorvaart where export=1;
create table ccargo as select * from clading where export=1;
</pre>


==Classification systems by Nina Bang==
==Classification systems by Nina Bang==

Siste sideversjon per 3. des. 2015 kl. 15:52

STRC: A coded version of STR

Standardisation in progress

The standardisation, coding, classification and conversion (coding in short) of the STR data is a work in progress (and in its infancy). The procedures below codes a larger part of the STR data and make a 5% sample available for crosstabulation.

The process starts with loading the raw data into a database. Then secondly a large number of auxilliary tables are loaded and manipulated using queries, functions, semi-coded data etc.

Thirdly the data itself is coded which creates a paralell set of coded raw data. Since the raw data consist of 6-7 million records a subset is created and made available at the internet for viewing and cross-tabulation

Standardisation using SQL

The raw data has been coded using Postgresql version 9.4 running on Ubuntu 14.04, while the online verions of the coded data has been imported on a different version of linux running postgres 9.1.

Loading main DATA

Please note that the following documentation of the import uses absolute file names. When using these routines the file location must be changed according to each users directory structure and where the data is downloaded.

Passages

If passages is downloaded from STRO using the query interface a CSV-file called passages_Year_=_YYYY.csv (e.g. YYYY=1754) will be created. The file consist of the selected passsages, first line includes field names, each field is seperated with comma and values are qouted using dobble-qoute ". Exported data is encoded in "Latin1". By mailing the STRO-group it is also possible to get larger chucks of data, however it is encoded in UTF-8, lacks field names and each field are seperated by semi-colon. The field name of all data is doorvaarten.csv. Therefore two import routines have been made, one for each of the supplied data format. Please note that the table definition are the same.

Import the complete data of passages

Passages are imported into a table named doorvaart, primary key is id_doorvaart.

-- Table structure for table 'doorvaart'
--
-- 
DROP TABLE IF EXISTS "doorvaart";

CREATE TABLE "doorvaart" (
  "id_doorvaart" INTEGER PRIMARY KEY NOT NULL,
  "dag"  INTEGER,
  "maand"  INTEGER,
  "jaar"  INTEGER,
  "volgnummer"  VARCHAR(255),
  "schipper_voornamen"  VARCHAR(255),
  "schipper_patroniem"  VARCHAR(255),
  "schipper_tussenvoegsel"  VARCHAR(255),
  "schipper_achternaam"  VARCHAR(255),
  "schipper_plaatsnaam"  VARCHAR(255),
  "soort_korting"  VARCHAR(255),
  "korting_daalder"  VARCHAR(255),
  "korting_skilling"  VARCHAR(255),
  "subtotaal1_daalder"  VARCHAR(255),
  "subtotaal1_skilling"  VARCHAR(255),
  "subtotaal2_daalder"  VARCHAR(255),
  "subtotaal2_skilling"  VARCHAR(255),
  "totaal_rosenobel"  VARCHAR(255),
  "totaal_daalder"  VARCHAR(255),
  "totaal_skilling"  VARCHAR(255),
  "andere_muntsoort1"  VARCHAR(255),
  "andere_muntsoort1_totaal"  VARCHAR(255),
  "andere_muntsoort2"  VARCHAR(255),
  "andere_muntsoort2_totaal"  VARCHAR(255),
  "privilege"  VARCHAR(255),
  "opmerking_bron"  VARCHAR(500),
  "opmerking_invoerder"  VARCHAR(255),
  "opmerking_controle"  VARCHAR(255),
  "controle_1"  VARCHAR(255),
  "controle_2"  VARCHAR(255),
  "controle_3"  VARCHAR(255),
  "naam_invoerder"  VARCHAR(255),
  "datum_opgevoerd"  VARCHAR(255),
  "dirnummer"  VARCHAR(255),
  "tonnage"  VARCHAR(255),
  "belasting_korting"  VARCHAR(255),
  "images"  VARCHAR(255),
  "hulp"  VARCHAR(255),
  "hulp1"  VARCHAR(255),
  "korting_ort"  VARCHAR(255),
  "subtotaal1_ort"  VARCHAR(255),
  "subtotaal2_ort"  VARCHAR(255),
  "totaal_ort"  VARCHAR(255),
  "hulp2" VARCHAR(255)
);

COPY doorvaart FROM '/home/arne/STRM/data/doorvaarten.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';
Import selected and downloaded passages

The following code in Postgresql (SQL of the DBMS Postgres) imports the downloaded data into a table (relation) called doorvart. The complete code for importing all three tables, passages, cargoes and taxes can be downloaded here.

-- Table structure for table 'doorvaart'
--
--  "id_doorvaart";"dag";"maand";"jaar";"volgnummer";"schipper_voornamen";"schipper_patroniem";"schipper_tussenvoegsel";"schipper_achternaam";"schipper_plaatsnaam";"soort_korting";"korting_daalder";"korting_skilling";"subtotaal1_daalder";"subtotaal1_skilling";"subtotaal2_daalder";"subtotaal2_skilling";"totaal_rosenobel";"totaal_daalder";"totaal_skilling";"andere_muntsoort1";"andere_muntsoort1_totaal";"andere_muntsoort2";"andere_muntsoort2_totaal";"privilege";"opmerking_bron";"opmerking_invoerder";"opmerking_controle";"controle_1";"controle_2";"controle_3";"naam_invoerder";"datum_opgevoerd";"dirnummer";"tonnage";"belasting_korting";"images";"hulp";"hulp1";"korting_ort";"subtotaal1_ort";"subtotaal2_ort";"totaal_ort";"hulp2"
-- "410987";"29";"6";"1754";"373";"Gerrit";"";"";"Hendrichs";"Workum";"";"";"";"";"";"";"";"";"2";"";"";"";"";"";"";"";"";"";"J";"N";"N";"ERICB";"31-03-10";"226";"";"";"";"";"";"";"";"";"";""

Create table as above, then


COPY doorvaart FROM '/home/arne/STRM/data/passages_Year_=_1754.csv' DELIMITER ';' CSV HEADER QUOTE '"' ENCODING 'latin1' NULL '';

Taxes

Import all data
--
-- Table structure for table 'belastingen'
--


DROP TABLE IF EXISTS "belasting";

CREATE TABLE "belasting" (
  "id_doorvaart" INTEGER NOT NULL,
  "regel" INTEGER NOT NULL,
  "naam" VARCHAR(255),
  "muntsoort" VARCHAR(255),
  "hoofdbedrag" VARCHAR(255),
  "subbedrag" VARCHAR(255),
  "korting" VARCHAR(255),
  "hulp" VARCHAR(255),
  PRIMARY KEY (id_doorvaart, regel)
);

COPY belasting FROM '/home/arne/STRM/data/belastingen.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';

Import selected data

--
-- Table structure for table 'belasting'
--

-- "id_doorvaart";"regel";"naam";"muntsoort";"hoofdbedrag";"subbedrag";"korting";"hulp"
-- "410987";"1";"Fyrpenge";"Daler";"2";"";"N";""

DROP TABLE IF EXISTS "belasting";

CREATE TABLE "belasting" (
  "id_doorvaart" INTEGER NOT NULL,
  "regel" INTEGER NOT NULL,
  "naam" VARCHAR(255),
  "muntsoort" VARCHAR(255),
  "hoofdbedrag" VARCHAR(255),
  "subbedrag" VARCHAR(255),
  "korting" VARCHAR(255),
  "hulp" VARCHAR(255),
  PRIMARY KEY (id_doorvaart, regel)
);

COPY belasting (id_doorvaart, regel, naam, muntsoort, hoofdbedrag, subbedrag, korting, hulp) FROM '/home/arne/STRM/data/taxes_Year_=_1754.csv' DELIMITER ';' CSV HEADER QUOTE '"' ENCODING 'latin1' NULL '';

Cargoes

Cargoes is imported into a table lading. In order to achieve higher performance on the later coding of data three indexes are made.

Importing all data

--
-- Table structure for table 'ladingen'
--

DROP TABLE IF EXISTS "lading";

CREATE TABLE "lading" (
  "id_doorvaart" INTEGER NOT NULL,
  "regel" INTEGER NOT NULL,
  "van" VARCHAR(255),
  "naar" VARCHAR(255),
  "maat" VARCHAR(255),
  "aantal" VARCHAR(255),
  "soort" VARCHAR(255),
  "daalder" VARCHAR(255),
  "skilling" VARCHAR(255),
  "maat_alt" VARCHAR(255),
  "aantal_alt" VARCHAR(255),
  "ort" VARCHAR(255),
  "hulp" VARCHAR(255),
  PRIMARY KEY (id_doorvaart, regel)
);

COPY lading FROM '/home/arne/STRM/data/ladingen.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';


Importing a selected subset
--
-- Table structure for table 'lading'
--
-- "id_doorvaart";"regel";"van";"naar";"maat";"aantal";"soort";"daalder";"skilling";"maat_alt";"aantal_alt";"ort";"hulp"
-- "410987";"1";"Amst.";"Narva";"";"";"Ballast";"";"";"";"";"";""

Create table as above, then import


COPY lading FROM '/home/arne/STRM/data/cargoes_Year_=_1754.csv' DELIMITER ';' CSV HEADER QUOTE '"' ENCODING 'latin1' NULL '';

Loading aux DATA

In order to manage the automatic coding and converting of data a set of auxillary tables have been created. These tables could be called inverted tables, since each table consist of the unique field values for selected field, is frequency. To each of these tables standardised or coded fields will be added. Later in the coding process the uniques spelling of places, commodities, measurements etc. will be replaced by its code. This type of coding is context-insensitive, i.e. it does not take the context of the data into consideration. In a context-sensitive coding the context of the data is important, e.g. the same measurement expressed in words has a different metric value if the ship departed from Bergen or from Riga. Many measurements have this property, a scottish pound was different from a London pound. Measurements also changed over time, the length of an inch was different not only in place but changed also over time Context-insensitive coding/conversion is done purely on the meaning of the word itself and does not take time or space into consideration. The time-space conversion is done on a later stage in order to improve and make the context-sensitive coding.

Standardised place names

The standardisation of placenames have been done by George Welling (RUG) and X Y from Tresoar archive. The file all_places_standard.csv is loaded into a table named stdplaces. University of Groningen and the TRESOAR archive have standardised ports of departure, destination and shipmasters home port. By end of 2015 about 50.000 spelling variations of port names have been coded into a 6 letter alfanumeric code.

Please note that in stdplaces is using a 8 letter code, the first three letters is numeric, the three alfa (A-Z) and some places are coded with addtional 2 digits. The two digits are used if the place name consist of two different places, e.g. the ship was intended to go to Spain however it planned to go to Norway to do extra loading before sailing to Spain. This way of coding the two destion ports are not recommended, rather the two destionations should have been coded into two 6 letter codes in order to have a strict coding system. In order to achive this following procedure is done:

  • Two six letter plcodes have been created
  • If the placename consist of two placenames, these are automatically sepearated

Stdplaces is not part of the coded database. To get the data please contact STRO

-- Create table stdplaces
-- "schipper","Aace","000000","Unknown","Unindentifiable Ports","Unknown",,"1"
--

DROP TABLE IF EXISTS "stdplaces";

CREATE TABLE "stdplaces" (
    plctype varchar(8),
    plcname varchar(255),
    plccod8 varchar(8),
    stdname varchar(255),
    region varchar(255),
    bigregion varchar(255),
    bfield boolean,
    onefield varchar(255)
);

CREATE INDEX ix_pclname ON stdplaces (lower(plcname));
CREATE INDEX ix_stdname ON stdplaces (lower(stdname));

COPY stdplaces (plctype, plcname, plccod8, stdname, region, bigregion, bfield, onefield) FROM '/home/arne/STRM/data/all_places_standard.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';
  
ALTER TABLE stdplaces DROP COLUMN bfield;
ALTER TABLE stdplaces DROP COLUMN onefield;
ALTER TABLE stdplaces ADD COLUMN plccod varchar(6);
ALTER TABLE stdplaces ADD COLUMN plcco2 varchar(6);
ALTER TABLE stdplaces ADD COLUMN place varchar(255);
ALTER TABLE stdplaces ADD COLUMN place2 varchar(255);


UPDATE stdplaces SET plccod = substring(plccod8, 1, 6);

Port of departure

A sample of the data can be viewed here

--
-- Port of departure
--
DROP TABLE IF EXISTS "portdept";

CREATE TABLE "portdept" (
  id serial not null,
  name varchar(255),
  freq integer,
    plccod char(6),
    primsel integer,
    frgnsel integer,
  PRIMARY KEY (name)
);

-- Using header to skip first line which contains null value

COPY portdept (name, freq) FROM '/home/arne/STRM/data/ports_departure.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';

Port of destination

All unique spelling variations of name of port of destination and the frequency of the term.

A sample of the data can be viewed here

--
-- Port of destination
--
DROP TABLE IF EXISTS "portdest";

CREATE TABLE "portdest" (
  id serial not null,
  name varchar(255),
  freq integer,
    plccod varchar(6),
    frgnsel integer,
    primsel integer,
  PRIMARY KEY (name)
);

-- Using header to skip first line which contains null value

COPY portdest (name, freq) FROM '/home/arne/STRM/data/ports_destination.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';

Ship masters home port

All unique spelling variations of name of ship masters home port and the frequency of the term

A sample of the data can be viewed here

--
-- Ship masters home Port
--
DROP TABLE IF EXISTS "smhport";

CREATE TABLE "smhport" (
  id serial not null,
  name varchar(255),
  freq integer,
    plccod char(6),
    frgnsel integer,
    primsel integer,
  PRIMARY KEY (name)
);

-- Using header to skip first line which contains null value

COPY smhport (name, freq) FROM '/home/arne/STRM/data/shipmaster_home_ports.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';

Classifcation or Subsets created from stdplaces

Unique ports

A table of unique ports are created from all_places_standardised. The six alfanumeric code is used as identification of a unique port. The table uniqport is used during coding of geograpichal codes and classification of regions, current country etc. A select-query creates the table uniqports and afterwards new columns are added and inititated.

A sample of the data can be viewed here

--
-- Create table uniqport
--
DROP TABLE IF EXISTS "uniqport"  CASCADE;

SELECT substring(plccod8,1, 6) AS plccod, min(stdname) AS stdname, min(region) as region, min(bigregion) as bigregion INTO uniqport FROM stdplaces where (stdname is NOT NULL) GROUP BY substring(plccod8,1, 6);

ALTER TABLE uniqport ADD COLUMN currcntry varchar(25);
ALTER TABLE uniqport ADD COLUMN eastwest char(2);
ALTER TABLE uniqport ADD COLUMN primsel integer;
ALTER TABLE uniqport ADD PRIMARY KEY(plccod);

UPDATE uniqport set currcntry = region where ((region='Norway') or (region='Scotland') or (region='Italy') or (region='Ireland') or (region='Denmark'));
UPDATE uniqport set currcntry = region where ((region='Spain') or (region='Esthonia') or (region='Iceland') or (region='Portugal') or (region='The Netherlands'));
UPDATE uniqport set currcntry = 'France' where (region LIKE 'France%');
UPDATE uniqport set currcntry = 'Germany' where (region LIKE 'Hamburg%');
UPDATE uniqport set primsel=0;

UPDATE portdept SET plccod =  stdplaces.plccod FROM stdplaces WHERE stdplaces.plcname=portdept.name;
UPDATE portdest SET plccod =  stdplaces.plccod FROM stdplaces WHERE stdplaces.plcname=portdest.name;
UPDATE smhport SET plccod =  stdplaces.plccod FROM stdplaces WHERE stdplaces.plcname=smhport.name;

Region

A sample of the data can be viewed here

--
-- Create table region
--

DROP TABLE IF EXISTS "region";

SELECT region as name, count(region) as freq, min(bigregion) as bigregion INTO region FROM stdplaces where (region is NOT NULL) GROUP BY region;
ALTER TABLE region ADD COLUMN currcntry varchar(25);
ALTER TABLE region ADD COLUMN eastwest char(1);
ALTER TABLE region ADD PRIMARY KEY(name);

UPDATE region set eastwest = bigregion.eastwest FROM bigregion where bigregion.name=region.bigregion;

UPDATE region set eastwest = 'W' where ((name like 'The Neth%') or (name='Norway') or (name like '%Medi%') or (name like 'Engl%')  or (name like '%Atla%'));
UPDATE region set eastwest = 'E' where (name like '%Baltic%');

UPDATE uniqport set eastwest = region.eastwest from region where uniqport.region = region.name;
Big regions

A sample of the data can be viewed here

--
-- Create table bigregion
--

DROP TABLE IF EXISTS "bigregion";

SELECT bigregion as name, count(bigregion) as freq, min(region) as minregion, max(region) as maxregion INTO bigregion FROM stdplaces where (bigregion is NOT NULL) GROUP BY bigregion;
ALTER TABLE bigregion ADD COLUMN eastwest char(1);
ALTER TABLE bigregion ADD PRIMARY KEY(name);

UPDATE bigregion set eastwest = 'W' where ((name like 'Dutc%') or (name like 'Nor%') or (name like '%Medi%') or (name like 'Great%') or (name like '%Amer%') or (name like '%Atla%') or (name like 'Ice%') or (name like 'Arc%') or (name like '%Far East%') or (name like 'West%'));
UPDATE bigregion set eastwest = 'E' where (name like '%Baltic%');

Creating functions

Advanced text comparsions functions is nessesary since the data consist of many spelling variations of the same word. In order to compare places, commodties, measurements, classes etc the jaro-winkler function for word or string similarity has been added to the database. This can be downloaded from github and loaded into the database simply by the statement

\i pg_similarity.sql

In order to convert fractions to decimal numbers, and to split combined fields like placenames or commodities three general functions have been created

splitog (s, splitstring, fieldno) as text: Takes a input, e.g. destination port "København og Riga" or commodities "Silke og bommuld" and split the string into two. Input is the text to be split, second the split argument and thirdly which part to return, first or second


create or replace function splitog(varchar,varchar,integer) returns text as $$
declare
p integer;
X text;
sub text;
nret integer;
lsub integer;
begin
  X := trim(both $1::text);
  sub:= $2::text;
  nret := $3::integer;
  lsub := length(sub);
  if ((X='') or (X='-')) then
    return null;
  end if;
  p := 0;
  p := strpos(X, sub);
  if (p>1) then
    if nret=1 then
       X := trim(both substr(X,1,p));
    else
       X := trim(both substr(X, p+lsub,length(X)));
    end if;
  end if;
  return X::text;
EXCEPTION
  WHEN invalid_text_representation then
      return null;
end;
$$ language plpgsql immutable strict;


frac2num (s text) as double precision: Takes a string input, e.g. '2 3/4' and converts the text into 2.75.

create or replace function frac2num(varchar) returns float8 as $$
declare
p integer;
n float8;
num float8;
den float8;
X text;
begin
  X := trim(both $1::text);
  if ((X='') or (X='-')) then
    return null;
  end if;
  p := 0;
  n := 0;
  p := strpos(X, '/');
  if (p=0) then
    p:=strpos(X, ' ');
    if p>0 then
       n := (left(X,p-1))::float8;
    else 
       n:=X::float8;
    end if;
  else
    den:=(substr(X,p+1))::float8;
    if (p<2) and not (check_numeric(X)) then
      return null;
    end if;
    X := trim(left(X, p-1));
    p := strpos(X,' ');
    if p=0 then
       num := x::float8;
    else
       num := (substr(X, p+1, length(X)))::float8;
       n := (left(X,p-1))::float8;
    end if;
  end if;
  if den<>0 then
     n := n + (num / den);
  end if;
  return n::float8;
EXCEPTION
  WHEN invalid_text_representation then
      return null;
end;
$$ language plpgsql immutable strict;


check_numeric (s as varchar) as boolean: Takes a string text as input and check if it is a legal number.


--
create or replace function check_numeric(varchar) returns bool as $$
declare
i numeric;
begin
i := $1::numeric;
return 't'::bool;
EXCEPTION WHEN invalid_text_representation then
return 'f'::bool;
end;
$$ language plpgsql immutable strict;

===Modifying geographical data


UPDATE stdplaces SET plccod = substring(plccod8, 1, 6);
UPDATE stdplaces set place = splitog(plcname,' og ',1) where plcname like '% og %';
UPDATE stdplaces set place2 = splitog(plcname,' og ',2) where plcname like '% og %';

Loading coding data for commodoties

A sample of the data can be viewed here

--
-- Commodities
--

DROP TABLE IF EXISTS "commodities";

CREATE TABLE "commodities" (
  id serial not null,
  soortname varchar(110),
  freq integer,
    primsel integer,
    frgnsel integer,
    nbclass integer,
    nbcode char(7),
    ngcode integer,
    sitcr4 char(8),
        jwval double precision,
        prod1 varchar(110),
        prod2 varchar(110),
  PRIMARY KEY (soortname)
);

-- Using header to skip first line which contains null value

COPY commodities (soortname, freq) FROM '/home/arne/STRM/data/commodities.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';

UPDATE commodities set prod1 = splitog(soortname,' og ',1) where (soortname like '% og %');
UPDATE commodities set prod2 = splitog(soortname,' og ',2) where (soortname like '% og %');
UPDATE commodities set prod1 = splitog(soortname,' a ',1) where ((soortname like '% a %') and not (soortname like '% og %'));
UPDATE commodities set prod2 = splitog(soortname,' a ',2) where ((soortname like '% a %') and not (soortname like '% og %'));
--
-- Units for Commodities
--

Units for commodities. A sample of the data can be viewed here



DROP TABLE IF EXISTS "communit";

CREATE TABLE "communit" (
  id serial not null,
  maatname varchar(255),
  freq integer,
        stdmname varchar(100),
    primsel integer,
    frgnsel integer,
    volnum double precision,
    weighnum double precision,
    maatnum double precision,
    valnum double precision,
    lennum double precision,
        jwval double precision,
  PRIMARY KEY (maatname)
);

-- Using header to skip first line which contains null value

COPY communit (maatname, freq) FROM '/home/arne/STRM/data/commodities_unit.csv' DELIMITER ',' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';

update communit set primsel=1;
update communit set jwval=0.0;

Load standardised names and metric conversion for units of weigh, length, volume, etc. A sample of the data can be viewed here Based on Norsk Forlishistorisk Forening. Vedlikeholdes av Trond Johannessen. http://www.maritimt.net/arkforsk/danskem.htm

--
-- Load metric units
--
DROP TABLE IF EXISTS "metunit" cascade;

create table "metunit" (
  id_unit serial not null,
  betegnelse varchar(255),
  mtype varchar(10),
  altname varchar(255),
  comm1 varchar(255),
  comm2 varchar(255),
  yearfrom integer,
  yearto integer,
  fork varchar(10),
  forklaring varchar(255),
  antall float8,
  bestanddele varchar(255),
  metric float8,
  siunit char(3),
  tilsvarer float8,
  tunit varchar(20),
  primary key (id_unit)
);

CREATE INDEX ON metunit (betegnelse);

COPY metunit (betegnelse,mtype,altname,comm1,comm2,yearfrom,yearto,fork,forklaring,antall,bestanddele,metric,siunit,tilsvarer,tunit) FROM '/home/arne/STRM/data/danskem.csv' DELIMITER ';' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';

alter table metunit add column primsel integer;

UPDATE metunit set primsel=0;
UPDATE metunit set primsel=1 where ((yearfrom<1858) and (yearto>1683) and ((mtype like 'Vekt') or (mtype like 'Volum') or (mtype like 'Lengde') or (mtype like 'Antall')));

update communit set maatnum = metric, jwval=jarowinkler(betegnelse,maatname), stdmname=betegnelse from metunit where ((metunit.primsel=1) and (mtype='Antall') and (jarowinkler(betegnelse,maatname)>jwval) and jarowinkler(betegnelse,maatname)>0.90);
update communit set volnum = metric, jwval=jarowinkler(betegnelse,maatname), stdmname=betegnelse from metunit where ((metunit.primsel=1) and (mtype='Volum') and (jarowinkler(betegnelse,maatname)>jwval) and jarowinkler(betegnelse,maatname)>0.90);
update communit set weighnum = metric, jwval=jarowinkler(betegnelse,maatname), stdmname=betegnelse from metunit where ((metunit.primsel=1) and (mtype='Vekt') and (jarowinkler(betegnelse,maatname)>jwval) and jarowinkler(betegnelse,maatname)>0.90);
update communit set lennum = metric, jwval=jarowinkler(betegnelse,maatname), stdmname=betegnelse from metunit where ((metunit.primsel=1) and (mtype='Lengde') and (jarowinkler(betegnelse,maatname)>jwval) and jarowinkler(betegnelse,maatname)>0.90);

UPDATE communit set maatnum = 1.0, jwval=jarowinkler('Stk',maatname), stdmname='Stykker' where (jarowinkler('Stk',maatname)>0.90) and (jarowinkler('Stk',maatname)>jwval);
UPDATE communit set maatnum = 12.0, jwval=jarowinkler('Tylter',maatname) where (jarowinkler('Tylter',maatname)>0.90) and (jarowinkler('Tylter',maatname)>jwval);
UPDATE communit set volnum = 231.6, jwval=jarowinkler('Oksehoved',maatname) where (jarowinkler('Oksehoved',maatname)>0.90) and (jarowinkler('Oksehoved',maatname)>jwval);
UPDATE communit set volnum = 138.97, jwval=jarowinkler('Tønder',maatname) where (jarowinkler('Tønder',maatname)>0.90) and (jarowinkler('Tønder',maatname)>jwval);
UPDATE communit set weighnum = 2600, jwval=jarowinkler('Lester',maatname) where (jarowinkler('Lester',maatname)>0.85) and (jarowinkler('Lester',maatname)>jwval);
UPDATE communit set weighnum = 2600, jwval=jarowinkler('Læst',maatname) where (jarowinkler('Læst',maatname)>0.85) and (jarowinkler('Læst',maatname)>jwval);
UPDATE communit set valnum = 1.0, jwval=jarowinkler('Rd',maatname), stdmname='Rdlr' where (jarowinkler('Rd',maatname)>0.90) and (jarowinkler('Rd',maatname)>jwval);
UPDATE communit set valnum = 1.0, jwval=jarowinkler('Rigsdal',maatname), stdmname='Rdlr' where (jarowinkler('Rigsdal',maatname)>0.90) and (jarowinkler('Rigsdal',maatname)>jwval);
UPDATE communit set weighnum = 1016, jwval=jarowinkler('Tons',maatname)  where (jarowinkler('Tons',maatname)>0.90) and (jarowinkler('Tons',maatname)>jwval);
UPDATE communit set weighnum = 6.35, jwval=jarowinkler('Steen',maatname)  where (jarowinkler('Steen',maatname)>0.90) and (jarowinkler('Steen',maatname)>jwval);

Load Nina Bangs classification system for commodities. A sample of the data can be viewed here



--
-- Create table nbclass
--
-- "eastwest";"nbclass";"nbclsname";"french";"german";"english";"nbchar"
-- "EB";"1";"Salt ";"sel";"Salz";"salt";"001"


DROP TABLE IF EXISTS "nbclass"  CASCADE;
CREATE TABLE "nbclass" (
  direction char(2),
  nbclass integer,
  nbclsname varchar(255),
  french varchar(255),
  german varchar(255),
  english varchar(255),
  nbchar char(3),
  PRIMARY KEY (nbclass)
);

CREATE INDEX ON nbclass (lower(nbclsname));

COPY nbclass (direction,nbclass,nbclsname,french, german, english, nbchar) FROM '/home/arne/STRM/data/nbclass-export.txt' DELIMITER ';' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';
Load Nina Bangs coding system system for commodities. 
A sample of the data can be [http://solli.h.uib.no/showtab.php?tabname=nbcode viewed here]
--
-- Create table nbcode
--
-- "eastweast";"nbcode";"nbvare";"nbclass";"Level3"
-- "EB";"001.001";"Salt";1,00;0,00


DROP TABLE IF EXISTS "nbcode"  CASCADE;
CREATE TABLE "nbcode" (
  direction char(2),
  nbcode char (7),
  nbvare varchar(255),
  nbclass integer,
  level3 integer,
  PRIMARY KEY (nbcode)
);

CREATE INDEX ON nbcode (lower(nbvare));

COPY nbcode (direction,nbcode,nbvare,nbclass, level3) FROM '/home/arne/STRM/data/nbcode-export.txt' DELIMITER ';' CSV HEADER QUOTE '"' ENCODING 'utf-8' NULL '';

Alter table nbcode add column altname varchar(255);
update nbcode set altname = replace(nbvare,'æ','e') where nbvare like ('%æ%');
update nbcode set altname = replace(nbvare,'Æ','E') where nbvare like ('%Æ%');


Update commodites coding table according to Load Nina Bangs coding and classification system using




UPDATE commodities set nbclass = nbclass.nbclass FROM nbclass where (commodities.soortname = nbclass.nbclsname);
update commodities set jwval=0.0;
UPDATE commodities set nbcode = nbcode.nbcode, jwval=1.0, nbclass=nbcode.nbclass FROM nbcode where (commodities.soortname = nbcode.nbvare);

-- Update ballast
--
update commodities set nbclass = -1, jwval=jarowinkler(commodities.soortname,'Ballast'), nbcode='-1', ngcode=-1 where ((jarowinkler(commodities.soortname,'Ballast')>jwval) and (jarowinkler(commodities.soortname,'Ballast')>0.93) and (commodities.soortname like '%Ballast%'));

-- Update on commodity name
--
UPDATE commodities set nbcode = nbcode.nbcode, jwval=jarowinkler(commodities.soortname,nbcode.nbvare), nbclass=nbcode.nbclass FROM nbcode where ((commodities.freq >0) and (jarowinkler(commodities.soortname,nbcode.nbvare)>jwval) and (jarowinkler(commodities.soortname,nbcode.nbvare)>0.89));

-- Update on commodity alternate name, æ/e problem
--
UPDATE commodities set nbcode = nbcode.nbcode, jwval=jarowinkler(commodities.soortname,nbcode.altname), nbclass=nbcode.nbclass FROM nbcode where ((commodities.freq >0) and (jarowinkler(commodities.soortname,nbcode.altname)>jwval) and (jarowinkler(commodities.soortname,nbcode.altname)>0.89) and not (altname=''));

-- Update on split of commodity , first product, " og " and " a " problem
--
UPDATE commodities set nbcode = nbcode.nbcode, jwval=jarowinkler(commodities.prod1,nbcode.nbvare), nbclass=nbcode.nbclass FROM nbcode where ((commodities.freq >0) and (jarowinkler(commodities.prod1,nbcode.nbvare)>jwval) and (jarowinkler(commodities.prod1,nbcode.nbvare)>0.89));

Coding raw data


--
-- Table structure for table 'cdoorvaart'
--

DROP TABLE IF EXISTS "cdoorvaart";

CREATE TABLE "cdoorvaart" (
  "id_doorvaart" INTEGER PRIMARY KEY NOT NULL,
  "dag"  INTEGER,
  "maand"  INTEGER,
  "jaar"  INTEGER,
  "volgnummer"  integer,
  "smhport"  CHAR(6),
  "totaal_daalder"  double precision,
  "totaal_skilling"  double precision,
  "tottax" double  precision,
  "schipper_voornamen"  CHAR(4),
  "schipper_patroniem"  CHAR(4),
  "schipper_achternaam"  CHAR(4),
  vanfrom char(6),
  naarto char(6),
  direction char(2),
  export integer
);

INSERT into cdoorvaart (id_doorvaart, dag, maand, jaar, volgnummer) select id_doorvaart, dag, maand, jaar, volgnummer::int from doorvaart;

UPDATE cdoorvaart SET totaal_daalder =  cast(doorvaart.totaal_daalder as double precision) FROM doorvaart WHERE ((cdoorvaart.id_doorvaart=doorvaart.id_doorvaart) and (check_numeric(doorvaart.totaal_daalder)));

UPDATE cdoorvaart SET totaal_skilling =  cast(doorvaart.totaal_skilling as double precision) FROM doorvaart WHERE ((cdoorvaart.id_doorvaart=doorvaart.id_doorvaart) and (check_numeric(doorvaart.totaal_skilling)));

UPDATE cdoorvaart SET totaal_daalder = frac2num(doorvaart.totaal_daalder) from doorvaart where ((cdoorvaart.id_doorvaart=doorvaart.id_doorvaart) and (doorvaart.totaal_daalder<>'') and not (check_numeric(doorvaart.totaal_daalder)));

UPDATE cdoorvaart SET totaal_skilling =  frac2num(doorvaart.totaal_skilling) FROM doorvaart WHERE ((cdoorvaart.id_doorvaart=doorvaart.id_doorvaart) and (doorvaart.totaal_skilling<>'') and not (check_numeric(doorvaart.totaal_skilling)));


UPDATE cdoorvaart set tottax = totaal_daalder + (totaal_skilling/96);

UPDATE cdoorvaart set smhport = smhport.plccod FROM smhport, doorvaart WHERE ((doorvaart.schipper_plaatsnaam = smhport.name) and (doorvaart.id_doorvaart=cdoorvaart.id_doorvaart));


--
-- Table structure for table 'clading'
--

DROP TABLE IF EXISTS "clading" CASCADE;


CREATE TABLE "clading" (
  "id_doorvaart" INTEGER NOT NULL,
  "regel" INTEGER NOT NULL,
  "vanfrom" CHAR(6),
  "naarto" CHAR(6),
  "maat" double precision,
  "aantal" double precision,
  "ebsoort" CHAR(7),
  "wbsoort" CHAR(7),
  "daalder" double precision,
  "skilling" double precision,
  "rddeci" double precision,
  "metmeasure" double precision,
  siunit CHAR(3);
  year integer,
  direction char(2),
  export integer,
  PRIMARY KEY (id_doorvaart, regel)
);

INSERT into clading (id_doorvaart, regel) select id_doorvaart, regel from lading;

UPDATE clading SET daalder =  cast(lading.daalder as double precision) FROM lading WHERE ((clading.id_doorvaart=lading.id_doorvaart) and (clading.regel=lading.regel) and (check_numeric(lading.daalder)));

UPDATE clading SET daalder =  frac2num(lading.daalder) FROM lading WHERE ((clading.id_doorvaart=lading.id_doorvaart) and (clading.regel=lading.regel)  and (lading.daalder<>'') and not (check_numeric(lading.daalder)));

UPDATE clading SET skilling =  cast(lading.skilling as double precision) FROM lading WHERE ((clading.id_doorvaart=lading.id_doorvaart)and (clading.regel=lading.regel) and (check_numeric(lading.skilling)));

UPDATE clading SET skilling =  frac2num(lading.skilling) FROM lading WHERE ((clading.id_doorvaart=lading.id_doorvaart)and (clading.regel=lading.regel)  and (lading.skilling<>'') and not (check_numeric(lading.skilling)));


UPDATE clading set rddeci = daalder + (skilling/96);
update clading set rddeci = daalder where skilling is null;
UPDATE clading set rddeci = (skilling/96) where daalder is null;


UPDATE clading SET aantal =  cast(lading.aantal as double precision) FROM lading WHERE ((clading.id_doorvaart=lading.id_doorvaart) and (lading.regel=clading.regel) and (check_numeric(lading.aantal)));

UPDATE clading SET aantal =  frac2num(lading.aantal) FROM lading WHERE ((clading.id_doorvaart=lading.id_doorvaart) and (lading.regel=clading.regel) and (lading.aantal<>'') and not (check_numeric(lading.aantal)));

UPDATE clading set vanfrom = portdept.plccod FROM portdept, lading WHERE ((lading.id_doorvaart=clading.id_doorvaart) and (lading.regel=clading.regel) and (lading.van = portdept.name));

UPDATE clading set naarto = portdest.plccod FROM portdest, lading WHERE ((lading.id_doorvaart=clading.id_doorvaart) and (lading.regel=clading.regel) and (lading.naar = portdest.name));

UPDATE clading set direction = concat(substr(uniqport.eastwest,1,1), 'B') FROM uniqport WHERE (clading.naarto=uniqport.plccod);

UPDATE clading set maat = communit.maatnum, siunit='stk' FROM communit, lading WHERE ((lading.id_doorvaart=clading.id_doorvaart) and (lading.regel=clading.regel) and (lading.maat = communit.maatname) and (communit.maatnum>0));

UPDATE clading set maat = communit.volnum, siunit='L' FROM communit, lading WHERE ((lading.id_doorvaart=clading.id_doorvaart) and (lading.regel=clading.regel) and (lading.maat = communit.maatname) and (communit.volnum>0));

UPDATE clading set maat = communit.weighnum, siunit='kg' FROM communit, lading WHERE ((lading.id_doorvaart=clading.id_doorvaart) and (lading.regel=clading.regel) and (lading.maat = communit.maatname) and (communit.weighnum>0));

UPDATE clading set maat = communit.lennum, siunit='m' FROM communit, lading WHERE ((lading.id_doorvaart=clading.id_doorvaart) and (lading.regel=clading.regel) and (lading.maat = communit.maatname) and (communit.lennum>0));

UPDATE clading SET metmeasure = aantal * maat;

Update clading set year = jaar from cdoorvaart where (clading.id_doorvaart=cdoorvaart.id_doorvaart);

-- cdoorvaart

update cdoorvaart set vanfrom = clading.vanfrom, naarto = clading.naarto from clading where ((clading.id_doorvaart=cdoorvaart.id_doorvaart) and (clading.regel=1));


UPDATE cdoorvaart set direction = concat(substr(uniqport.eastwest,1,1), 'B') FROM uniqport WHERE (cdoorvaart.naarto = uniqport.plccod);

-- direction and Nina Bang coding


update clading set ebsoort = nbcode from commodities, lading WHERE ((lading.id_doorvaart=clading.id_doorvaart) and (lading.regel=clading.regel) and (lading.soort = commodities.soortname)); 

update clading set wbsoort = nbcode from commodities, lading WHERE ((lading.id_doorvaart=clading.id_doorvaart) and (lading.regel=clading.regel) and (lading.soort = commodities.soortname)); 


--
-- Table structure for table 'cbelasting'
--

DROP TABLE IF EXISTS "cbelasting";

CREATE TABLE "cbelasting" (
  "id_doorvaart" INTEGER NOT NULL,
  "regel" INTEGER NOT NULL,
  "naam" CHAR(2),
  "muntsoort" CHAR(2),
  "hoofdbedrag" double precision,
  "subbedrag" double precision,
  "korting" CHAR(2),
  "totbedrag" double  precision,
  export integer,
  PRIMARY KEY (id_doorvaart, regel)
);

INSERT into cbelasting (id_doorvaart, regel) select id_doorvaart, regel from belasting;

UPDATE cbelasting SET hoofdbedrag =  cast(belasting.hoofdbedrag as double precision) FROM belasting WHERE ((cbelasting.id_doorvaart=belasting.id_doorvaart) and (cbelasting.regel=belasting.regel) and (check_numeric(belasting.hoofdbedrag)));

UPDATE cbelasting SET subbedrag =  cast(belasting.subbedrag as double precision) FROM belasting WHERE ((cbelasting.id_doorvaart=belasting.id_doorvaart) and (cbelasting.regel=belasting.regel) and (check_numeric(belasting.subbedrag)));

UPDATE cbelasting set totbedrag = hoofdbedrag + subbedrag where ((hoofdbedrag is not null) and (subbedrag is not null));
UPDATE cbelasting set totbedrag = hoofdbedrag where (subbedrag is null);
UPDATE cbelasting set totbedrag = subbedrag where (hoofdbedrag is null);

Extraction of a sample

This example show how every passage that have cargoes to Norway, Spain and Portugal are extracted into the sample tables

--
-- Set export indicators
--
update cbelasting set export=0;
update cdoorvaart set export=0;
update clading set export=0;

update uniqport set primsel=0;
update uniqport set primsel=1 where ((uniqport.region='Norway') or (uniqport.region='Spain') or (uniqport.region='Portugal'));

update clading set export=1 from uniqport where ((clading.naarto=uniqport.plccod) and (uniqport.primsel=1));
update cdoorvaart set export=1 from clading where ((cdoorvaart.id_doorvaart=clading.id_doorvaart) and (clading.export=1));
update cbelasting set export=1 from cdoorvaart where ((cbelasting.id_doorvaart=cdoorvaart.id_doorvaart) and (cdoorvaart.export=1));

--
-- Make export tables
--   
drop table IF EXISTS ctax cascade;
drop table IF EXISTS cpassage cascade;
drop table IF EXISTS ccargo cascade;

create table ctax as select * from cbelasting where export=1;
create table cpassage as select * from cdoorvaart where export=1;
create table ccargo as select * from clading where export=1;

Classification systems by Nina Bang

  • Nina E. Bangs classification of commodities converted into a database table.
    • Download the classification system as XLSX. Download as DBF.
    • Download the list of products as XLS. Download as DBF.
  • Nina E. Bangs coding of ports (departure, port of intended destination, shipmaster home port)
    • Converted into a database table. Download as XLS. Download as DBF.
    • Download the list of ports as geocoded GIS file (ArcGIS). Download as DBF