gw2sqlA GeneWeb to SQL converterby Yann Corno - last update: 2009-06-07 |
Contents
gw2sql converts a GeneWeb GW file into an SQL file that can be read in any SQL database.
GW files, called "GeneWeb Source Files", are created by the gwu
tool that is included with GeneWeb.
See the GeneWeb documentation for details about this file format.
It is a PERL script, so it can be used on any platform. My original purpose was to have an automatic job running once a day that would dump all my GeneWeb database into MySQL, so that information could be processed easily: running statistics, compiling information, running other scripts (like a birthday mailing list), etc.
perl gw2sql.pl [-i infile.gw] [-o outfile.sql] [--nocreate|-n] [--nodata] [--mysql3] [--relax]
Parameters in [brakets] are optional.
-i yourfile.gw name of your GW input file. If omitted, standard input is used. -o yourfile.sql name of the output SQL file. If omitted, standard output is used. -n, --nocreate do not ouput the tables creation instructions. --nodata any gw file or standard input. Use this option to only output the table creation instructions. Useful when using sql2gw. --mysql3 use if you have an UTF-8 encoded file, and that your SQL database does not support the ALTER TABLE db DEFAULT CHARACTER SET uft8; instruction (supported by MySQL only from v. 4.0). --relax skip lines that could not be understood (juts issue a warning). By default, it fails when an unknown line is encountered.
The following example makes a complete transfer of a GeneWeb database into a fresh MySQL one (under Unix or similar):
gwu YourGeneWebBase | perl gw2sql.pl | mysql yourDB
Windows users will have to use files:
gwu YourGeneWebBase -o YourGeneWebBase.gw perl gw2sql.pl -i YourGeneWebBase.gw -o YourGeneWebBase.sql Then import YourGeneWebBase.sql in your SQL database.
gw2sql converts a GeneWeb GW file into an SQL file that can be read in any SQL database.
It starts by printing a list of table creation instructions, defining the tables structures. (careful: there are DROP TABLES instructions, so previous versions of these tables will be trashed). You may want to modify this code to suit your own needs, such as adding indexes in these tables, or changing their prefix (see the $dbprefix variable), etc.
There are several tables created:
About referencing a person:
Because the GW format does not provide the person GeneWeb ID, :-(
we have to rely on the uniqueness of the triplet Lastname + Fistname + Number. Fortunately, GeneWeb
makes a good job at it in its GW format, so it works pretty well. So the database solution was to create
a primary key in the gw_person table that combines the 3 values together.
About dates:
As you might know if you read the GW documentation, the GeneWeb format for dates is
quite extensive. To address most uses, the dates fields are doubled. The GeneWeb date is put 'as is'
in a first field. Then, if the format is a simple and complete one (i.e. dd/mm/yyyy), it is copied
in SQL format in a second field. See the 'birth_date' and 'birth_date_sql' fields as an example. These
'*_sql' fields are indexed, and will allow easy searches. If the GeneWeb date could not be converted,
the SQL field contains NULL.
About unknown persons:
GeneWeb uses a question mark '?' to indicate an unknown last name or first name.
To ensure uniqueness of persons, it is assumed that each such person is a different one from the others.
Therefore, the last name and/or first name are given a unique number after the '?'. So you will end up
with gw_person records that have last names like '?1234', etc.
Note:
This first version is not intended to go deeply into the GeneWeb data structures (like dates).
It is merely a translation tool, and the database structure is really not sophisticated. As you will
see, you end up with hollow files (lots of NULL fields). It is then up to you to use your own SQL
skills to extract or compile the information in a way that fits your needs. SQL is quite convenient for
this.
Download gw2sql.pl (40 KB)
See header of script file.
Special thanks to my friend Erich from Germany, who kindly tested my script, uncovering many bugs. There is no good programmer without a good tester, and Erich is a great one!
This program is made available under the GeneWeb GPL Licence. GeneWeb is a software developed by Daniel de Rauglaudre, and is © Copyright INRIA. You may change and distribute this work as long as you comply with the GeneWeb GPL Licence.
IMPORTANT NOTICE: THIS WORK IS PROVIDED FOR EXPERIMENTAL PURPOSES ONLY. THERE ARE ABSOLUTELY NO WARRANTY WHATSOEVER THAT COMES WITH THIS SOURCE CODE AND THE INFORMATION DESCRIBED IN THIS PAGE. YOU MAY LOOSE YOUR DATA AND/OR HAVE COMPUTER HARDWARE OR SOFTWARE DAMAGES, OR SECURITY BREACHES. YOU ARE USING ALL THIS AT YOUR OWN RISKS. YOU HAVE BEEN WARNED !
You feed back or source code contributions are welcome! Contact me at . Have fun.
Yann Corno
Back to Yann's GeneWeb Tools