There are an easy way to import/export full-text fields as files?
that solve the problem of "load as multiple lines".
Trying with SQL's COPY I can only to transform full-file into full-table, not into a single text field, because each line from COPY is a raw.that solve the save-back problem, to save the full XML file in the filesystem, without changes in bynary representation (preserving SHA1), and without other exernal procedures (as Unix
sed
use).
The main problem is on export, so this is the title of this page.
PS: the "proof of same file" in the the round trip — import, export back and compare with original — can be obtained by sha1sum
demonstration; see examples below. So, a natural demand is also to check same SHA1 by SQL, avoiding to export on simple check tasks.
All examples
Import a full text into a full-table (is not what I need),
and test that can export as the same text.
PS: I need to import one file into one field and one row.Transform full table into one file (is not what I need)
and test that can export as same text.
PS: I need one row (of one field) into one file.Calculate the hash by SQL, the SHA1 of the field.
Must be the same when compare ... Else it is not a solution for me.
The folowing examples show each problem and a non-elegant workaround.
1. Import
CREATE TABLE ttmp (x text); COPY ttmp FROM '/tmp/test.xml' ( FORMAT text ); -- breaking lines lines COPY (SELECT x FROM ttmp) TO '/tmp/test_back.xml' (format TEXT);
Checking that original and "back" have exactly the same content:
sha1sum /tmp/test*.* 570b13fb01d38e04ebf7ac1f73dfad0e1d02b027 /tmp/test_back.xml 570b13fb01d38e04ebf7ac1f73dfad0e1d02b027 /tmp/test.xml
PS: seems perfect, but the problem here is the use of many rows. A real import-solution can import a file into a one-row (and one field). A real export-solution is a SQL function that produce test_back.xml
from a single row (of a single field).
2. Transform full table into one file
Use it to store XML:
CREATE TABLE xtmp (x xml); INSERT INTO xtmp (x) SELECT array_to_string(array_agg(x),E'\n')::xml FROM ttmp ; COPY (select x::text from xtmp) TO '/tmp/test_back2-bad.xml' ( FORMAT text );
... But not works as we can check by sha1sum /tmp/test*.xml
, not produce the same result for test_back2-bad.xml
.
So do also a translation from \n
to chr(10), using an external tool (perl, sed or any other) perl -p -e 's/\\n/\n/g' /tmp/ata_back2.xml > /tmp/test_back2-good.xml
Ok, now test_back2-good.xml
have the same hash ("570b13fb..." in my example) tham original. Use of Perl is a workaround, how to do without it?
3. The SHA1 of the field
SELECT encode(digest(x::text::bytea, 'sha1'), 'hex') FROM xtmp;
Not solved, is not the same hash tham original (the "570b13fb..." in my example)... Perhaps the ::text
enforced internal representation with \n
symbols, so a solution will be direct cast to bytea
, but it is an invalid cast. The other workaround also not is a solution,
SELECT encode(digest( replace(x::text,'\n',E'\n')::bytea, 'sha1' ), 'hex') FROM xtmp
... I try CREATE TABLE btmp (x bytea)
and COPY btmp FROM '/tmp/test.xml' ( FORMAT binary )
, but error ("unknown COPY file signature").
1 Answers
Answers 1
COPY
isn't designed for this. It's meant to deal with table-structured data, so it can't work without some way of dividing rows and columns; there will always be some characters which COPY FROM
interprets as separators, and for which COPY TO
will insert some escape sequence if it finds one in your data. This isn't great if you're looking for a general file I/O facility.
In fact, database servers aren't designed for general file I/O. For one thing, anything which interacts directly with the server's file system will require a superuser role. If at all possible, you should just query the table as usual, and deal with the file I/O on the client side.
That said, there are a few alternatives:
- The built-in
pg_read_file()
function, andpg_file_write()
from theadminpack
module, provide the most direct interface to the file system, but they're both restricted to the cluster's data directory (and I wouldn't recommend storing random user-created files in there). lo_import()
andlo_export()
are the only built-in functions I know of which deal directly with file I/O and which have unrestricted access to the server's file system (within the constraints imposed by the host OS), but the Large Object interface is not particularly user-friendly....- If you install the untrusted variant of a procedural language like Perl (
plperlu
) or Python (plpythonu
), you can write wrapper functions for that language's native I/O routines. - There isn't much you can't accomplish via
COPY TO PROGRAM
if you're determined enough - for one, you couldCOPY (SELECT 1) TO PROGRAM 'mv <source_file> <target_file>'
to work around the limitations ofpg_file_write()
- though this blurs the line between SQL and external tools somewhat (and whoever inherits your codebase will likely not be impressed...).
nice article for beginners.thank you.
ReplyDeleteprogramming tutorial