unicode - PDF: Duplicate font names with different ToUnicode Cmaps -

- June 15, 2011

i'm parsing pdf file , extracting of text, , i've run situation encounter font dictionary named "c2_0", contains cidfont (type 0) tounicode cmap. so, no problem - have tools parse tounicode cmap , map 2-byte character codes unicode values.

but pdf file later includes another font dictionary object, also called "c2_0", contains different tounicode cmap. didn't how should handle sec cmap, guessed , combined entries both cmaps. worked, , extracted text correctly.

but, can't find in pdf reference manual says allowed, or addresses situation. have thought duplicate font names lead unspecified behavior, or @ to the lowest degree have sec override first or something. tried combining them longshot guess - , surprised worked.

does have experience this? know if pdf allowed have duplicate font names refer different objects different cmaps "combine" when invoked tf operator?

c2_0 symbolic name in /font resource dictionary , has local scope, used in content stream resource dictionary belongs to. if c2_0 appears in /font resource dictionary, that's not problem. in have in same /font resource dictionary 2 c2_0 entries: /c2_0 x 0 r /c2_0 y 0 r have problem because behavior undefined , how handle situation. symbolic name resolution works this: if in page content stream, search font symbolic name (the tf operand) in page's resources dictionary. if cannot locate it, go in page tree , search resources dictionary (if exist) each parent page node. if reached top of page tree , did not find font, behavior undefined. @ moment can implement various fallback strategies: can utilize default font, can search resources included in form xobjects on page, can search resources dictionaries in other pages.

pdf unicode

Search This Blog

Kamlesh

unicode - PDF: Duplicate font names with different ToUnicode Cmaps -

Comments

Post a Comment

Popular posts from this blog

How do I check if an insert was successful with MySQLdb in Python? -

delphi - blogger via idHTTP : error 400 bad request -

postgresql - ERROR: operator is not unique: unknown + unknown -