Monday, August 27, 2018

Add pdf metadata with accents in python

Leave a Comment

I want to change the metadata of the pdf file using this code:

from PyPDF2 import PdfFileReader, PdfFileWriter  title = "Vice-présidence pour l'éducation" fin = open(filename, 'rb') reader = PdfFileReader(fin) writer = PdfFileWriter() writer.appendPagesFromReader(reader) metadata = reader.getDocumentInfo()  metadata.update({'/Title':title})  writer.addMetadata(metadata)  fout = open(filename, 'wb') writer.write(fout)  fin.close() fout.close() 

It works fine if the title is in english(no accents) but when it has accents I get the following error:

TypeError: createStringObject should have str or unicode arg 

How can I add a title with accent to the metadata ?

Thank you

1 Answers

Answers 1

The only way to get this error message is to have the wrong type for the parameter string in the createStringObject(string)-function in the library itself.

It's looking for type string or bytes using these functions in utils.py

import builtins bytes_type = type(bytes()) # Works the same in Python 2.X and 3.X string_type = getattr(builtins, "unicode", str) 

I can only reproduce your error if I rewrite your code with an obviously wrong type like this (code is rewritten using with statement but only the commented line is important):

from PyPDF2 import PdfFileReader, PdfFileWriter   with open(inputfile, "rb") as fr, open(outputfile, "wb") as fw:     reader = PdfFileReader(fr)     writer = PdfFileWriter()      writer.appendPagesFromReader(reader)     metadata = reader.getDocumentInfo()      # metadata.update({'/Title': "Vice-présidence pour l'éducation"})     metadata.update({'/Title': [1, 2, 3]})  # <- wrong type here !     writer.addMetadata(metadata)      writer.write(fw) 

It seems that the type of your string title = "Vice-présidence pour l'éducation" is not matching to whatever bytes_type or string_type is resolved. Either you have a weird type of the title variable (which I cannot see in your code, maybe because of creating a MCVE) or you have trouble getting bytes_type or string_type as types intended by library writer (this can be a bug in the library or an erroneous installation, hard to tell for me).

Without reproducible code, it's hard to provide a solution. But hopefully this will give you the right direction to go. Maybe it's enough to set the type of your string to whatever bytes_type or string_type is resolved to. Other solutions would be on library site or simply hacks.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment