The objective of this section of the report is to share results on the identification of recurrent accessibility issues per categories of ebooks that will need remediation to fit EAA requirements. We’ll first define the target, our methodology and the scoring threshold established for this analysis, as well as the identified biases and limits. Then we’ll present the results we deemed useful for the next steps of the ABE Lab project, with a list of recurrent accessibility issues detected. Lastly, we’ll define the outcomes of this work and the ebook classification we developed which will be used to test remediation tools and workflows.
The European Accessibility Act (EAA) 1 requirements for ebooks are listed in Annex I sections III and IV Linea f). EPUB Accessibility - EU Accessibility Act Mapping 2 is a W3C group note that shows how EPUB files conforming to EPUB accessibility guidelines ( EPUB Accessibility 1.1 and WCAG 2.1 AA) are responding to the European Accessibility Act requirements. Those two documents help us define our target and basis to establish the accessibility deficiencies.
Pre-paginated ebooks (like PDFs and Fixed Layout EPUBs) do not comply with the EAA requirements for the criterion of flexibility and choice in the presentation of the content 3 , a key functionality for persons facing cognitive difficulties (like dyslexia) or with sight impairments. As remediation for these types of ebooks would mean a change of format, we choose to introduce middle-way target remediation to allow the study of remediation to today's format state and possibilities offered by remediation tools. The target for these documents will be compliance with the Web Content Accessibility Guidelines (WCAG) 2.1 4 . In addition, PDFs will have to reach PDF/UA conformity, a dedicated standard registered as ISO 5 .
The backlist data analysis allowed us to define a wish list of categories of files to collect in order to represent the backlist composition in a small but consistent sample. We had a target objective of 200 files from 5 countries. This objective was exceeded, with 351 files collected from 7 EU countries (Denmark, Finland, France, Germany, Italy, the Netherlands, Spain), additionally including some samples also from the United Kingdom. We used the Thema codes as our reference for classification. Some provided samples were classified to different Thema categories by the publisher. As it was not possible to separate the Thema categories, we chose to multiplicate these samples (one per Thema code, i.e. a book with Thema codes D, F was analysed two times, one as D and one as F), resulting in a total of 376 units to analyse.
We added to this sample one accessible EPUB3 target file6, to be sure that the gap emerging from our analysis was fitting the reality and that files already made accessible would not be considered as files with remediation needs. This target file was produced by LIA as born accessible in 2023.
We first established a list of key points indicators (KPI) we wanted to evaluate and from them, we could determine the data to extract from the samples. We verified which of these data were available from existing reporting tools (EPUBCheck, ACE, and RGTK for EPUB files; VeraPDF and PDFIX for PDF files, see annex Tools used in the automated analysis for a brief description of these tools) and determined the missing ones. Fondazione LIA developed a script to extract the missing data, aggregate all the data, and export a unified report. The details of the tests are available in the Detailed evaluation of the tests made on ebooks document, available for project partners and contributing publishers. 15 iterations of the script were made to refine data extraction and the exported reports. We started from a large number of data collected to stretch to a minimum necessary point.
The report was then used to develop calculation methods to define remediation complexity indicators7. Iterations were needed for this step as well, as data visualisation produced helped us identify biases, missings and non-relevant information. The results of the evaluation are presented and commented on within this document.
Usually, providers of remediation services classify the ebooks per complexity: a book with more images, tables or pages will get a higher score. This method is relevant if the whole set of ebooks to classify is produced from a known production workflow. Looking at the European level, we know that publishers’ workflows differ in the quality of files they produce, which consequently may be totally different in terms of accessibility features, accessibility information and, therefore, remediation needs.
That is why in this project we established a new classification related to remediation complexity, considering that an ebook may be very complex but already produced in accordance with accepted accessibility standards, thus resulting in a very low remediation complexity score. To be sure that the scoring was truly reflecting the remediation needs, we referred to our target file known to be fully accessible and with no remediation needs. With some iterations on scoring, we made sure that the target got a score of zero.
Capturing remediation complexities in relation to different file formats was one of the main challenges of the process. PDFs and Fixed-layout EPUBs are known to be the most complex to remediate as the technologies and languages used to build them imply more complexities and a higher level of programmatic abstraction. That’s why we decided to represent them apart.
One bias we had to deal with is that PDF format allows for less structure and metadata, resulting in less possibilities for analyses, which resulted in abnormally low scores for files in this format. To address this bias, we had to establish a complementary scoring calculation to apply to these files.
Therefore, each format has specificities related to contents found in the files and accessibility related features missing. To find the correct marker, a threshold of calculated key indicators has been established thru iterations.
Identified limits and bias
As previously commented, files in PDF format do not have the same accessibility possibilities as files in the EPUB format. Therefore, the comparison between the two formats must be done very consciously and should not lead to categorical formulas.
Most of the publishers providing samples are de facto aware of the accessibility subject and therefore the collection we have might be a biased representation of the backlist. A way to verify that would be to do a similar analysis on a large number of files not specifically selected for this type of test. This analysis perspective has been discussed with three members of EDRLab (Beletrina, De Marque and Hachette Livres) and we hope to be able to provide it as a complementary ABE Lab publication in the future.
At the time of writing this report, some remediation needs can not be spotted automatically, but as technological improvements are occurring very fast, we expect that a better gap analysis could be produced in the coming years. Examples of accessibility problems that cannot be automatically detected are incorrect, non-meaningful or insufficient image descriptions and wrong metadata claims, for which we were not able to establish a valid calculation method during this work.
The sample contains 84% (316 files) of reflowable EPUB (RFL); 9% (33 files) of pre-paginated EPUB3 Fixed Layout (FXL) and 7% (26 files) of PDFs. This, actually, does not properly represent any of the market segmentations observed in the backlist data analysis.
The low number of pre-paginated files in the sample limits the analysis pertinence. It may be interpreted as an interest of the publishers providing samples to have accurate analysis on the remediation needs of reflowable EPUB files rather than PDF and EPUB3 FXL files, as many ebooks coexist in both reflowable and pre-paginated formats.
The radar diagram and the data table in the next page show the results of the scoring. We resume here the main trendings per format:
PDF scoring ranges from 29 to 68 with representation in Thema categories A (The Arts), J (Society and Social Sciences), K (Economics, Finance, Business and Management), L (Law, ), M (Medicine and Nursing), P (Mathematics and Science) T (Technology, Engineering, Agriculture, Industrial processes) and V (Health, Relationships and Personal development).
EPUB3 Fixed Layout (FXL) average scoring ranges from 24 to 64 with representation in Thema categories A (The Arts,), C (Language and Linguistics), D (Biography, Literature and Literary studies), P (Mathematics and Science), S (Sports and Active outdoor recreation), T (Technology, Engineering, Agriculture, Industrial processes), W (Lifestyle, Hobbies and Leisure), X (Graphic novels, Comic books, Manga, Cartoons) and Y (Children’s, Teenage and Educational) ;
EPUB3 reflowable average scoring ranges from 4 to 77 with representation in all Thema categories except X (Graphic novels, Comic books, Manga, Cartoons).
This overview shows a concrete difference in ranges, where reflowable formats are almost all below a score of 50 and pre-paginated formats are all over 50. As commented before, the lack of information provided in PDF files might lead to minoring the remediation complexity. We tried to compensate for that in our scoring threshold, but remediation testing will have to establish if the compensation is enough or misleading.
We also detected that pre-paginated are not represented in every Thema code, while reflowables are missing only for category X: Graphic novels, Comic books, Manga, and Cartoons. This shows that, except for visual narratives, all types of books can be produced in a reflowable format.
From these results, it seems legit to treat remediation of pre-paginated files apart from the reflowable ones. This result will be represented in our remediation classification through the establishment of a first level of complexity related to file format.
Focus on reflowable EPUB3
As reflowable EPUB3 is the format allowing full compliance to the EAA requirements, we judged it essential to dive deeper in the analysis of the remediation complexity of files in this format. In the collected samples files we found scores from 4 to 73 points. The vast majority have a score between 10 and 30.
The following charts and tables give a full representation. We will summarise here the key information we found:
most of the files have a medium remediation complexity, but there is also a good number of files with high scores (fig. 4);
images to fix (meaning textual alternatives to establish) are the heaviest error affecting strongly all categories except for L (Laws) and F (Fiction) (fig. 5);
most of the categories have a large amplitude of errors per file, meaning that the Thema category alone is not sufficient to establish a segmented average remediation cost (fig. 6).
List data: number of files per score
Score 0: 1 files
Score 1: 0 files
Score 2: 0 files
Score 3: 0 files
Score 4: 1 files
Score 5: 6 files
Score 6: 0 files
Score 7: 3 files
Score 8: 4 files
Score 9: 5 files
Score 10: 13 files
Score 11: 4 files
Score 12: 1 files
Score 13: 5 files
Score 14: 2 files
Score 15: 6 files
Score 16: 18 files
Score 17: 12 files
Score 18: 12 files
Score 19: 15 files
Score 20: 24 files
Score 21: 18 files
Score 22: 11 files
Score 23: 15 files
Score 24: 7 files
Score 25: 4 files
Score 26: 5 files
Score 27: 0 files
Score 28: 1 files
Score 29: 6 files
Score 30: 3 files
Score 31: 3 files
Score 32: 6 files
Score 33: 5 files
Score 34: 3 files
Score 35: 3 files
Score 36: 5 files
Score 37: 1 files
Score 38: 2 files
Score 39: 4 files
Score 40: 7 files
Score 41: 5 files
Score 42: 5 files
Score 43: 4 files
Score 44: 0 files
Score 45: 3 files
Score 46: 4 files
Score 47: 3 files
Score 48: 3 files
Score 49: 1 files
Score 50: 9 files
Score 51: 4 files
Score 52: 3 files
Score 53: 4 files
Score 54: 2 files
Score 55: 4 files
Score 56: 3 files
Score 57: 2 files
Score 58: 2 files
Score 59: 3 files
Score 60: 8 files
Score 61: 6 files
Score 62: 19 files
Score 63: 8 files
Score 64: 2 files
Score 65: 1 files
Score 66: 5 files
Score 67: 5 files
Score 68: 4 files
Score 69: 2 files
Score 70: 1 files
Score 71: 0 files
Score 72: 3 files
Score 73: 5 files
|Thema||Publications||images to fix||unique ACE issues||possibly wrong language||files without headings|
Table 6: number of publications, minimum, average and maximum scores per Thema codes.
|Thema code||Publications||Average Score||Standard Deviation||Minimum Score||Maximum Score|
|Thema code||Publications||Average Score||Standard Deviation|
Recurrent accessibility issues detected
As a complement to the Thema category level gap analysis, we listed the main known accessibility issues and tried to identify occurrences of these accessibility issues in the collected files. The following table resumes our findings. Results on each accessibility issue are detailed in the following sections.
|Accessibility issue||concern||Number of files||in % of the sample|
|Missing Accessibility Metadata||EPUB files||343||100|
|Non reflowable content||all formats||59||16|
|Missing or bad textual alternative for non decorative graphical resources||all formats||312||83|
|Missing or bad Language Tag||EPUB files||227||66|
|ACE Issues||EPUB files||319||93|
Missing Accessibility Metadata
Issue: no accessibility metadata are present
Rule: EPUBaccessibility 1.1 section '2. Discoverability’
Applies to: EPUB files
Problem: the reader cannot know features or limitations they may experience while reading and the publication can’t be discovered through filtering
Indicators: calculated as follows: missing metadata - inferred metadata8, -3 (conformance metadata are counted as missing per ACE, but are not requested by the EAA). , minimum = 0
Collected files affected: 100%
Non Reflowable content
Issue: the presentation of the content can’t be adjusted to fit the reader’s needs
Rule: EAA, Annex I, Section IV, f
Applies to: all formats
Problem: fixed displays impeach correct visual adaptation of the content
Indicators: pre-paginated formats
Collected files affected: 16%
Missing or bad textual alternative for non decorative graphical resources
Issue: No textual alternative is provided for informative graphical contents or the alternative is recognized as not meaningful (file name or one word)
Rule: WCAG, Guideline 1.1 Text Alternatives, Success Criterion 1.1.1 Non-text Content, level A
Applies to: all formats
Problem: the non visual readers using TTS or assistive technologies will lose important information necessary to understand the content
Indicators: calculated as follows: content images – content images with alt-text (more than one word and not equal to filename) – contents images decorative
Collected files affected: 83%
Missing or bad Language Tag
Issue: words in different languages from the one of the main content are not identified as such
Rule: WCAG, Guideline 3.1 Readable, Success Criterion 3.1.2 Language of Parts, level AA
Applies to: all formats, but no way was found to identify that in PDF
Problem: non-visual readers using TTS or assistive technologies will experience strange or not understandable reading because of mispronunciation, incorrect braille rendering and bad hyphenations
Indicators: the wrong language assertion is done through a dedicated algorithm. It targets two or more following words in a sentence
Collected files affected: 66%
Issues reported by ACE. The following table shows the number of files and the corresponding percentage of the samples containing errors per severity level. We can note that very few (5% only) files have critical issues, but 92% have serious issues which will need to be evaluated for remediation.
Table 9: number and percentage of collected files affected per ACE issues gravity level.
|ACE issue||Number of files||% of the samples|
A larger table of unique ACE issues has been produced for the use of the project and the building of testing files. The details of those errors are reported in the following tables. One shows the errors for which we proposed a detailed remediation complexity KPI, while the second shows the errors that are not addressed by a specific calculation.
Table 10: percentage of the sample affected by ACE errors for which a detailed remediation complexity KPI has been established.
|ACE Issue||% of the samples affected|
Table 11: percentage of the sample affected by ACE errors for which no detailed remediation complexity KPI has been established.
|ACE Issue||% of the samples affected|
Potential accessibility issues undetectable through automated analysis
The following are issues that cannot be detected automatically and will require ad hoc human testing.
Issue: adjusting the presentation leads to letters or sentences overlapping or making the content visually unreadable in any way
Rule: EAA, Annex I, Section IV, f
Applies to: reflowable EPUB
Problem: fixed styles impeach correct visual adaptation of the content
Specific contents to be verified manually if found in files
Some very specialised contents such as forms, scripts, maths, videos and audios are not usually used in ebooks, but as this may happen, it will be necessary to include them in remediation testing. The following table shows that very few occurrences were found in the sample collection.
Table 12: number and percentage of collected files per specific content.
|Content||Number of files||in % of the samples|
Classification for remediation
The following classification aims to list the remediation workflows to test. A list of six elements is spread across the different categories, here is a summary of it:
PDF to PDF/UA (compliant to WCAG 2.1, level AA)
PDF to Reflowable EPUB3 (compliant EPUB Accessibility 1.1, WCAG 2.1, level AA)
FXL to «accessible» FXL (compliant EPUB Accessibility 1.1, WCAG 2.1, level AA)
FXL to Reflowable EPUB3 (compliant EPUB Accessibility 1.1, WCAG 2.1, level AA)
EPUB2 to EPUB3 (compliant EPUB Accessibility 1.1, WCAG 2.1, level AA)
Reflowable EPUB to Reflowable EPUB3 (compliant EPUB Accessibility 1.1, WCAG 2.1, level AA)
As a representation of the printed page, the PDF format accessibility features are limited in term of flexibility and choice in the presentation of the content (For details about the format and it’s known limitations, refer to the Annex Ebooks files formats).
We see two possible remediation options for the PDF files:
improve the file to reach PDF/UA standard with WCAG 2.1 AA conformance, allowing the file to support the text zoom functionality provided by most of the reading applications. These files will not totally comply with the EAA's requisites, but will provide state-of-the-art compliance.
convert the file to a reflowable EPUB to reach full compliance with EAA requirements.
Fixed Layout EPUBs
Fixed layout EPUBs are subject to the same visual adjustment limitations as PDF: changing font type and spaces between letters, words, lines, or paragraphs is not possible. The possible remediations are:
improve the file to reach WCAG 2.1 AA and EPUB accessibility 1.1. As in the case of PDF files, in Fixed Layout EPUB some accessibility features are supported and others are not. If the file is made according to the specifications, however, it must support text zoom functionality provided by most of the reading applications.
convert the file to a reflowable EPUB to reach full compliance with EAA requirements.
Reflowable EPUBs are known to be fully compliant with EAA requirements9 if they conform to WCAG 2.1 AA (or superior) and EPUB accessibility 1.1. e found different types of remediation needs:
EPUB2 files need to be converted to reflowable EPUB3;
Reflowable EPUB3 files need to become compliant with WCAG 2.1 AA and the EPUB accessibility 1.1 .
From this gap analysis, we were able to establish a classification of remediation needs and build test files for each of the classifications.
Direct outcomes of this work are
a remediation complexity assessment methodology applicable to collections of files;
a view of the remediation complexity per Thema category;
a view of main accessibility issues detected.
The heavy presence of images and visual resources appears to be the main criteria of demarcation between categories that will reclaim more efforts to remediate (Medicine, Earth sciences and Sports) and others (Fiction, Philosophy, Religion and Law) that will be easier to remediate.
As per the following steps of the ABE Lab project, it allows us to establish a testing classification and methodology as well as building meaningful files to test for remediation tools.
Available at https://www.w3.org/TR/epub-a11y-eaa-mapping/↩︎
EAA Annex I, Section IV, f) iii) available at
On October 5, 2023 version 2.2 of the WCAG was officially released as a W3C recommendation. This update does not impact or compromise the analysis, research work and testing carried out in the context of the ABE Lab project.↩︎
Target file publicly available at
Remediation complexity indicators are available for the publishers partners of the project.↩︎
Inferred metadata are found per RGTK, meaning that we are able to see an accessibility feature in the file even if the information about it was not provided by the publisher. Therefore no remediation need is necessary except for informing about it, which is already automated per RGTK.↩︎