Counteracting the contemporaneous proliferation of digital forgeries and fake news

Fake news has been certainly the expression of the moment: from political round table discussions to newspapers to social andmainstreammedia. It is everywhere.With such an intense discussion and yet few effective ways to combat it, what can be done? Providing methods to fight back even the least harming hoax is a social responsibility. To look for authenticity in a wide sea of fake news, every detail is a lead. Image appearance and semantic content of text and images are some of the main properties, which can be analyzed to reveal even the slightest lie. In this vein, this work overviews some recent methods applicable to the verification of dubious content in text and images, and discusses how we can put them together as an option to curb away the proliferation of unverified and phony “facts”. We briefly present the main idea behind each method, highlighting real situations where they can be applied and discussing expected results. Ultimately, we show how new research areas are working to seamlessly stitch together all these methods so as to provide a unified analysis and to establish the synchronization in space and time — the X-Coherence— of heterogeneous sources of information documenting real-world events.


INTRODUCTION
In a scenario where fake news is in every corner trying to convince readership that the most unlikely fact is an authentic truth, it is really difficult to tell apart genuine from phony facts.Cases such as the 2016 USA presidential election, when popular fake news stories were more widely shared on Facebook than the most popular mainstream news stories; (3) many people who see fake news stories report they believe them; and (4) the most discussed fake news stories tended to favor Donald Trump over Hillary Clinton.
In Brazil, according to Biller (2018), the combination of political polarization and passion for social media offers fertile ground for fake news in the run-up to the 2018 general elections, leading to results that could set the Brazilian society on a backward path or even favor the appearance of Fascist movements.
Furthermore, the broadcast of sensitive content, mainly pornographic, through the Internet is as dangerous as fake news.The situation is further complicated when fake news meets pornographic content and both become entangled.If 2016 was the year of "post-truth"1 -further consolidated in 2017 as the year of fake news -this year will probably be the year of DeepFakes (Morris 2018).This new form of content falsification makes use of deep learning algorithms (from the Artificial Intelligence field) to produce convincing face-swapping videos, in particular for replacing -in movie sequences -faces of porn stars with mainstream celebrities.Although the term is new, it is swiftly spreading out most likely propelled by a user-friendly and controversial (Farokhmanesh 2018) application called FakeApp, which allows anyone to create this kind of fake videos effortlessly.
But how can we fight back situations involving the broadcast of fake news?In recent years, scientists have been developing research in the field of Digital Forensics to prevent or to aid the investigation of such problems.Differently from the Information Security field, whose focus is on aspects concerning system's violation and unauthorized system access, Digital Forensics targets the development and deployment of methods for digital document analysis (images, videos, audio, and text), in order to evaluate, among other aspects, their authenticity.
Aiming at discussing possible ways of facing the aforementioned problems, this work brings an overview of recent research in an effort to combat fake news.We start with a discussion targeting document, image and text analysis.We then move to the study of methods for video verification and textual authorship detection.As we evolve in the presentation of ideas, we examine the rationale behind each method, highlight real situations where it could be applied, and discuss expected results.For more complex cases, when it is necessary to understand how a set of transformed images are related to each other, we review the image phylogeny framework setting the stage for the final part of the paper.Ultimately, we show how new research areas are working to provide more than a basal disjoint source analysis for a given situation, allowing the synchronization, in space and time (the X-Coherence), of heterogeneous sources documenting and describing real-world events, leading to a thorough understanding of facts.

METHODS FOR DOCUMENT ANALYSIS
Social media platforms, such as Facebook, Twitter, and Instagram, have been revolutionizing the way people communicate with each other.They are designed to enable users to interact, collaborate, and share anything they want in the process of creating as well as consuming content (Obar and Wildman 2015).Notwithstanding, users of these engaging platforms can easily, sometimes inadvertently, consume and broadcast dubious and sensitive content, establishing grounds for fake news proliferation.
In this complex and fast-paced setup, how to discern between pristine and fake content?How to evaluate whether certain pieces of text, images, and videos are factual?Computational methods covered in this section can aid the task in different ways.In this section, we first present literature related to the authenticity evaluation of digital images and printed text.Following, the described methods are able to detect fake videos and associate textual authorship, by taking content semantics into consideration.

FAKE IMAGE DETECTION
When analyzing suspicious documents, a suitable starting point is to evaluate the authenticity of images therein.In the digital era, images are particularly easy to manipulate using commonplace image editing software suites, such as Adobe Photoshop and Gimp, leading to an astonishing number of fake images reaching us everyday through the Internet.Moreover, some studies (Nightingale et al. 2017, Schetinger et al. 2017) suggest that human beings are notably limited in the task of distinguishing original and manipulated images, even when presented with photometric or geometric inconsistencies.
Image forgery can be classified roughly into two main groups: image splicing, which refers to situations in which parts of two or more images are used to compose a new one depicting an event that never happened, as shown in Figure 1(a); and copy-paste, which takes place when parts of the image itself are replicated (often with modifications) to hide content in the same image or to increase/decrease the importance of a specific aspect, as shown in Figure 1(b).
To detect such forgeries, experts look for traces of inconsistencies in different properties, such as illumination, compression, and noise.However, according to Rocha et al. (2011), illumination inconsistencies are potentially effective, mainly when dealing with image splicing: from the viewpoint of a manipulator, proper adjustment of the illumination conditions is hard to achieve when creating a composite image.
Considering techniques that deal with illumination inconsistencies, we can highlight two categories: (1) methods based on the light setting, which aims at finding inconsistencies in the light source position, and (2) methods based on light color, which look for inconsistencies in the color of illuminants of the scene.In a very simplistic form of putting it, the former group analyzes where is the source of illumination of a photograph while the latter investigates how objects are illuminated according to a given light source.
Methods based on the light setting are useful when evaluating the authenticity of outdoor images.One example where such methods could be applied is depicted in Figure 2, in which Marilyn Monroe and Elizabeth Taylor are side-by-side.This fake image, which circulated in social media in 2017, is the result of a composition (splicing) of at least two different photographs.
The method proposed by Carvalho et al. (2015) is appropriate to analyze this kind of image.It is based in the concept of normal vectors, often simply named as "normals", which are vectors perpendicular to a surface at a given point.Given some user intervention to mark points in the suspected region, approximately corresponding to normal vectors, the method uses the normal vectors' direction and illumination at the chosen marks to estimate the light source position for the object.
However, even a knowledgeable forensic expert is not 100% precise in the task of setting up normal marks.A proposed workaround is to estimate the light source position several times, always with a random and small distortion in the original normals, to find not a single position, but a region with a certain degree of confidence.In this case, if two or more objects yield inconsistent light source regions (without intersection), as depicted in Figure 2, it is an indicative of an image splicing as it denotes inconsistent light sources.2010) model, which proposed that, under the assumption of Lambertian reflectance, the observed intensity can be represented by second order spherical harmonics.Starting from this previous model, the authors proposed a more stable method for real scenario forensics applications.Using an "intensity binning sphere" (IBS), the intensities are binned by their surface normals, avoiding extrapolation over surface normals without observations.The authors also proposed a new error score, instead of a hard threshold, which is learned from training data, i.e., from face images that are acquired under known lighting.
In the second category, techniques that deal with inconsistencies in light color are very useful in more complex setups, often involving not just a single light source.One example where such methods could be applied is depicted in Figure 3, a fake image showing Putin surrounded by other world leaders, which circulated in social media in 2017.
The method by Carvalho et al. (2016)   possible forgeries.Illumination inconsistencies in objects with similar materials (such as human skin) become more pronounced when projecting the fake image onto illuminant maps.An illuminant map is a transformed color space which reproduces the illuminant color (the color of the light that appears to be emitted during the capture) in each region of the image.
There are different color constancy methods able to estimate scene illuminants and this work relied upon two of them: a statistical one (van de Weijer et al. 2007) used to estimate the illuminant from pixels; and a physics-based one (Riess and Angelopoulou 2010), which is a variant of the original inverse-intensity chromaticity space estimation proposed by Tan et al. (2004) to deal with local illuminant estimation.
Illuminant maps can be characterized by different statistical features: texture, color, and shape.Then, for each pair of selected faces in an image, these features are used to train different pattern classifiers, which vote to decide whether or not an image presents traces of illumination inconsistencies.If at least one pair of faces is classified as fake, the method provides a hint that the image may have been manipulated and that the examiner should look for other traces of doctoring.Figure 4 depicts a simplified overview of the method, which classifies an input as being genuine or fake.
The method proposed by Huh et al. (2018) takes advantage of the automatically recorded photo EXIF metadata as supervisory signal for training a model to determine whether an image is, or not, produced by image splicing.The authors apply a Siamese network to measure the consistency c i j of EXIF metadata attributes between two patches i and j.The Siamese network uses shared ResNet-50 (He et al. 2016) sub-networks, each one producing feature vectors with 4096 dimensions.Such vectors are concatenated and passed through a four-layer neural network with 4096, 2048, 1024 units, followed by the final output layer.Despite the interesting results, mainly on tampering maps generation, scenarios involving compressed test images have not been tested, which is an inconvenient drawback since this kind of operation tends to degrade accuracy.
The aforementioned methods are effective when dealing with images generated by image composition/splicing. Nevertheless, when the forgery operation involves just parts of the same image, as in copy-move images, there are more appropriate solutions.Silva et al. (2015) proposed a method tailored for copy-move image forgery detection based on a multi-scale analysis of the input image.The input  2016).An image is analyzed by segmenting all suspected faces.Each face is represented in the transformed illuminant space and further characterized through some image description methods such as the ones involving patterns of texture, color, and shape.Then, the descriptors for all pairs of faces in the image are concatenated.Ultimately, a classifier is trained based on the concatenated feature vectors to issue an authenticity decision upon receiving each pair of faces.
image is converted into the HSV color space to decrease possible false positive matches of similar regions.Then, the Speeded-Up Robust Features (SURF) algorithm (Bay et al. 2006) is used to detect a set of keypoints (representative regions of orientation change in the image), which are matched against each other.The method always associates keypoints in pairs by using the Nearest Neighbor Distance Ratio (NNDR) policy (Mikolajczyk and Schmid 2005).The next step consists in clustering keypoints into two groups, based on two specific constraints: (1) spatial proximity between keypoints assigned to the same group; and (2) similarity between the angles of the connection line of keypoints in the same group.The next two steps are, respectively, a Gaussian Pyramidal Decomposition to generate the image's scale-space, and a multi-scale lexicographical analysis, looking for candidate cloned regions in each scale of the pyramid.Finally, the method performs a voting process through the different pyramidal levels to find the final detection map.
An example of the usefulness of such techniques can be found in Figure 5.The image depicts a case which made the news years ago, when Iran was conducting missile tests (Shachtman 2008, Nizza andLyons 2008).The image is a result of copy-pasting portions of the successful missile launches to the failing ones.
Presenting results comparable to the state-of-the-art methods, but with the drawback of a high complexity implementation framework, Wang et al. (2018) proposed a method based on color invariance model and quaternion polar complex exponential transform (QPCET), for the detection and localization of copy-move forgeries.The proposed method consists of five main steps: (1) extraction of stable color image interest points using a detector composed by SURF (Bay et al. 2008) features and a color invariance model; (2) Also focusing on copy-move detection, Mahmood et al. (2018) proposed a technique based on stationary wavelet and discrete cosine transform.The method first converts the input image into the YC b C r color space.For feature extraction, the authors rely on two main steps: (1) using stationary wavelet transform (SWT), the method decomposes the suspicious image into four sub-bands (approximation, horizontal, vertical and diagonal); (2) it divides the approximation sub-band into overlapping blocks, and uses discrete cosine transform (DCT) to reduce them to six dimensions.This combination of SWT and DCT makes the representation of features more diverse and also appears as a better choice for copy-move detection.Using a lexicographical sorting algorithm, features are sorted and the similarity of close blocks are calculated.As a last step, a morphological opening operation with a structural element is applied over resulting maps for eliminating falsely detected areas.
Aiming to improve the detection of forgery localization in fake images, Zhou et al. (2018) proposed a method based on a two-stream Faster R-CNN (Ren et al. 2017) network, which is independent of the forgery process creation (splicing, copy-paste, etc.)The first is an RGB stream that uses a ResNet101 network (He et al. 2016) to learn features from the RGB image input, which are feed into a Region Proposal Network (RPN), in order to find tampering artifacts, such as strong contrast difference and unnatural tampered boundaries.In the second, which is a noise stream, the input RGB image goes through a steganalysis rich model (SRM) filter layer to discover the noise inconsistency between authentic and tampered regions.Features from both streams are fused through bilinear pooling to detect manipulation.

PRINTED DOCUMENT SOURCE DETECTION
Although we are living in the digital era, in which we are highly connected through many digital devices, printed paperwork is (still) everywhere.Due to the decreasing costs of printer devices (matrix dot, thermal, ink-jet, or laser printers) and the increasing number of digital documents, it is difficult to ensure the authenticity of printed documents against criminal intentions.Identifying the source of a printed document might prove beneficial in investigations involving forged contractual clauses, threatening letters, illegal correspondence, fake currency and documents, among others.
In this vein, it is pivotal to recognize the device signature based on the different characteristics left by its mechanical nuances.Shang et al. (2014) proposed a method to distinguish text documents from laser printer, ink-jet printer, and copier, using features such as noise energy, contour roughness of the character, and average gradient of the character edge region.A SVM classifier is applied for each character and a voting mechanism provides the final result, with a reported 90% accuracy.Similar approaches can be seen in (Joshi and Khanna 2018, Ferreira et al. 2015, Bertrand et al. 2013, Tsai and Liu 2013), where hand-crafted features from characters are extracted and combined for single classification.Although similar, the feature extraction may differ, ensuring some advantages for each technique in different scenarios.
Looking for a more general solution, Ferreira et al. (2017) developed a set of tools to analyze and to recognize document ownership based on clues left behind by printer devices using a data-driven approach.In this approach, several parallel Convolutional Neural Networks (CNNs) extract meaningful discriminative patterns from the analyzed documents.The method is capable of learning distinctive attribution features directly from available training data, a remarkable advance when compared to prior art.By representing these patterns in different ways, it is possible to better identify printing artifacts based on printed characters and, therefore, enhance the document-printer attribution task.
Figure 6 shows the document attribution pipeline where documents are scanned and characters are identified.The approach is based on the analysis of small patches or regions of the analyzed document, represented by text characters.Multiple representations of the same character are used as complementary features, increasing the overall accuracy.These representations are formed by raw data (characters image pixels), media filter residual (subtracting the raw image from the media filtered version provides high frequency imperfections), and average filter residual (subtracting the raw image from the average filtered version isolates border effects).All these three representations are used as input by shallow CNNs, which learn the most relevant discriminant characteristics.The created feature vectors are concatenated and used as input by a set of linear classifiers, called "early fusion".This step is represented by the middle block in the figure.
The classification results at character level are combined by a majority voting mechanism, called "late fusion", represented by the next block in the figure.The final decision-making process states first convolutional layer is made of 20 5 × 5 filters s followed by a non-overlapping max pooling layer ze 2 × 2 and stride 2. cond convolutional layer, with 50 filters of size × 20 is followed by another non-overlapping max ng layer of size 2 × 2 and stride 2. ner product layer, which generates a vector ∈ R 500 .500 dimensional vector is non-linearly processed a ReLU function applied element-wise.nner product layer acts as classifier with as many t confidence scores as the number of printers able during training.ft-max layer finally outputs the index and the dence of the most probable printer.
proposed approach, we train the network using cture and then feed the training images again to trained the network, extracting 500-dimensional tors in the last but one layer and repeating the the testing images.To follow the literature, we used k as a feature extractor only, transferring the feature nother and well used classifier for this application.rk autonomously learns which characteristics mages are relevant for discriminating the different lly, the network is trained using stochastic gradient ith a momentum set to 0.9.We used an initial te of 0.

D. Classification With Early and Late Fusion
The proposed CNN architecture is characterized by a limited amount of parameters, in order to allow a fast and reliable training even with a small number of labeled samples available.Small networks, as the one we are using, are expected to have worse performance with respect to bigger and deeper networks typically used in the computer vision community [17].To compensate for this issue, we propose to use two lightweight fusion methods depicted in first convolutional layer is made of 20 5 × 5 filters s followed by a non-overlapping max pooling layer ze 2 × 2 and stride 2. cond convolutional layer, with 50 filters of size × 20 is followed by another non-overlapping max ng layer of size 2 × 2 and stride 2. ner product layer, which generates a vector ∈ R 500 .500 dimensional vector is non-linearly processed a ReLU function applied element-wise.nner product layer acts as classifier with as many t confidence scores as the number of printers able during training.ft-max layer finally outputs the index and the dence of the most probable printer.
proposed approach, we train the network using cture and then feed the training images again to trained the network, extracting 500-dimensional tors in the last but one layer and repeating the the testing images.To follow the literature, we used k as a feature extractor only, transferring the feature nother and well used classifier for this application.rk autonomously learns which characteristics of mages are relevant for discriminating the different lly, the network is trained using stochastic gradient ith a momentum set to 0.9.We used an initial te of 0.

D. Classification With Early and Late Fusion
The proposed CNN architecture is characterized by a limited amount of parameters, in order to allow a fast and reliable training even with a small number of labeled samples available.Small networks, as the one we are using, are expected to have worse performance with respect to bigger and deeper networks typically used in the computer vision community  first convolutional layer is made of 20 5 × 5 filters s followed by a non-overlapping max pooling layer ze 2 × 2 and stride 2. cond convolutional layer, with 50 filters of size × 20 is followed by another non-overlapping max ng layer of size 2 × 2 and stride 2. ner product layer, which generates a vector ∈ R 500 .500 dimensional vector is non-linearly processed a ReLU function applied element-wise.nner product layer acts as classifier with as many t confidence scores as the number of printers able during training.ft-max layer finally outputs the index and the dence of the most probable printer.
proposed approach, we train the network using cture and then feed the training images again to trained the network, extracting 500-dimensional tors in the last but one layer and repeating the the testing images.To follow the literature, we used k as a feature extractor only, transferring the feature nother and well used classifier for this application.rk autonomously learns which characteristics of mages are relevant for discriminating the different lly, the network is trained using stochastic gradient ith a momentum set to 0.9.We used an initial te of 0.001 and a weight decay of 0.0005 without e used a batch size (subsampling of image examples ne forward/backward pass through the network) ages without batch normalization.The number of

D. Classification With Early and Late Fusion
The proposed CNN architecture is characterized by a limited amount of parameters, in order to allow a fast and reliable training even with a small number of labeled samples available.Small networks, as the one we are using, are expected to have worse performance with respect to bigger and deeper networks typically used in the computer vision community  first convolutional layer is made of 20 5 × 5 filters s followed by a non-overlapping max pooling layer ze 2 × 2 and stride 2. cond convolutional layer, with 50 filters of size × 20 is followed by another non-overlapping max ng layer of size 2 × 2 and stride 2. ner product layer, which generates a vector ∈ R 500 .500 dimensional vector is non-linearly processed a ReLU function applied element-wise.nner product layer acts as classifier with as many t confidence scores as the number of printers able during training.ft-max layer finally outputs the index and the dence of the most probable printer.
proposed approach, we train the network using cture and then feed the training images again to trained the network, extracting 500-dimensional tors in the last but one layer and repeating the the testing images.To follow the literature, we used k as a feature extractor only, transferring the feature nother and well used classifier for this application.rk autonomously learns which characteristics of mages are relevant for discriminating the different lly, the network is trained using stochastic gradient

D. Classification With Early and Late Fusion
The proposed CNN architecture is characterized by a limited amount of parameters, in order to allow a fast and reliable training even with a small number of labeled samples available.Small networks, as the one we are using, are expected to have worse performance with respect to bigger and deeper networks typically used in the computer vision community first convolutional layer is made of 20 5 × 5 filters s followed by a non-overlapping max pooling layer ze 2 × 2 and stride 2. cond convolutional layer, with 50 filters of size × 20 is followed by another non-overlapping max ng layer of size 2 × 2 and stride 2. ner product layer, which generates a vector ∈ R 500 .500 dimensional vector is non-linearly processed a ReLU function applied element-wise.nner product layer acts as classifier with as many t confidence scores as the number of printers able during training.ft-max layer finally outputs the index and the dence of the most probable printer.
proposed approach, we train the network using cture and then feed the training images again to trained the network, extracting 500-dimensional tors in the last but one layer and repeating the the testing images.To follow the literature, we used k as a feature extractor only, transferring the feature nother and well used classifier for this application.rk autonomously learns which characteristics of

D. Classification With Early and Late Fusion
The proposed CNN architecture is characterized by a limited amount of parameters, in order to allow a fast and reliable training even with a small number of labeled samples available.Small networks, as the one we are using, are expected to have worse performance with respect to bigger and deeper networks typically used in the computer vision community [17].To compensate for this issue, we propose to use two lightweight fusion methods depicted in Fig 8: first convolutional layer is made of 20 5 × 5 filters s followed by a non-overlapping max pooling layer ze 2 × 2 and stride 2. cond convolutional layer, with 50 filters of size × 20 is followed by another non-overlapping max ng layer of size 2 × 2 and stride 2. ner product layer, which generates a vector ∈ R 500 .500 dimensional vector is non-linearly processed a ReLU function applied element-wise.nner product layer acts as classifier with as many t confidence scores as the number of printers able during training.ft-max layer finally outputs the index and the dence of the most probable printer.
proposed approach, we train the network using cture and then feed the training images again to trained the network, extracting 500-dimensional tors in the last but one layer and repeating the the testing images.To follow the literature, we used k as a feature extractor only, transferring the feature

D. Classification With Early and Late Fusion
The proposed CNN architecture is characterized by a limited amount of parameters, in order to allow a fast and reliable training even with a small number of labeled samples available.Small networks, as the one we are using, are expected to have worse performance with respect to bigger and deeper networks typically used in the computer vision    Ultimately, all features are combined for decision-making.
which printer, from a set of suspect printers, was used to print the document under investigation.

FAKE VIDEO AND SENSITIVE MEDIA DETECTION
With the spreading availability of user-friendly applications to generate fake videos, some of which are used solely to generate pornographic content effortlessly (Farokhmanesh 2018) -the so-called deep fakes -it comes the alarming necessity of methods to reliably detect fake videos.Due to the incredibly realistic generated videos, it can be extremely difficult, for humans and computers, to discern between original and synthetic content, which becomes even more challenging when those videos are shared in low resolution, with various compression artifacts.Some recent methods in literature have gained momentum due to the their ability to generate compelling manipulated content, by face reenactment in real-time (Thies et al. 2016), by learning lip-sync from audio (Suwajanakorn et al. 2017), or by animating static images (Averbuch-Elor et al. 2017).Being to accurately detect such content would aid in ceasing the proliferation of fake news by, for instance, blocking or tagging the manipulated images and videos disseminated in social media.
Mainly due to the lack of data (Rössler et al. 2018), which is a requirement for training modern machine learning methods, research in video manipulation detection is rather limited, in opposition to the research scenario discussed for images.The literature focuses on some simple clues, often found in carelessly generated videos, such as insertion and deletion of frames (Gironi et al. 2014, Smith et al. 2017), copy-move manipulations (Bestagini et al. 2013), and green-screen splice (Mullan et al. 2017).
The presented scenario indicates that more research is necessary, considering newly proposed datasets (Rössler et al. 2018) which, for instance, enables the study of video compression on the detection task, a problem often overlooked in literature.One research area that might help in this effort is pornography and violence detection, which are common themes in fake news, specially in videos.Such methods could be used, for instance, to tag content prior to the forgery detection.
Considering pornography and violence detection, there are some works in the literature targeting broader contexts.Moreira et al. (2016) proposed a detection and localization method for general pornography and violence scenes in videos.In a parallel work, Perez et al. (2017) have a similar solution but using deep-learning techniques.
Child pornography is a serious unfolding from general pornography, which only recently has gained proper attention.Automatically distinguishing child pornography from adult pornography and regular everyday images/videos is the main goal of a work conducted by our research group in collaboration with the Brazilian Federal Police and several universities (Vitorino et al. 2018).Based on data-driven strategies, the approach consists of first training CNNs to address different tasks for which there is a massive amount of available training examples, such as general image classification (objects, persons, cars, etc.).Then, through transfer learning techniques, the networks are fine-tuned first to general pornography detection and then further refined for child pornography content detection, outperforming different off-the-shelf solutions.

TEXTUAL AUTHORSHIP DETECTION -WHO DIT IT?
According to the statistics company Statista,2 Twitter is currently among the most popular social networks worldwide, with some 330 million active users, are able to read and post short messages, the so-called tweets.In this sea of users and tweets, fans can happily interact with their idols, such as the pop-band Coldplay, or the soccer player Cristiano Ronaldo.However, in the same way images are forged to generate fake news, this technology can also be used shady purposes.
In 2015, the New York Times documented the case of a Russian media agency that allegedly ran organized disinformation campaigns on social media using pseudonyms and virtual identities (Chen 2015).Ruling an office full of media professionals, the agency achieved success in promoting fake news stories, influencing public opinion on politics.Cases such as this one are examples of how online anonymity can encourage less accountability, being powerful triggers for fake news.Early on in 2018, another full-coverage of fake profiles on social media has broken the news.A Times report delved into the social media's black market in which fake profiles can be bought to boost online popularity (Confessore et al. 2018).Equally alarming are estimates that some 48 million of Twitter's reported active users are automated accounts seeking to simulate real people, according to the article.
The problem of text authorship attribution based on short sequences of text is not new.Sanderson and Guenter (2006) evaluated the usage of word sequence kernels based on Markov chains for words and characters.The considered short text varies from 300 to 5,000 words, which is much more than Twitter's 280 characters limitation (historically 140 characters).Stamatatos (2009) highlighted the difficulties of short text scenarios and its associated challenges.Even considering an accumulative representation, which is considered best when only short text is available, the text length is still a major issue.Focusing on cybercrime identification from short texts shared on Twitter, Layton et al. ( 2010) adopted the Source-Code Author Profile (SCAP) methodology, introducing new preprocessing methods for text of 140 characters or less.Schwartz et al. ( 2013) described the concept of a k-signature for an author, which is formed by character and word n-grams.They also described a new feature, called flexible patterns, to capture fine-grained nuances in an author's style.Looking to identify an author's style from tweets, Bhargava et al. ( 2013) blended several lexical, and tweet specific metrics.These metrics were later evaluated by Overdorf and Greenstadt (2016) in a cross-domain scenario, for which the authors proposed specific feature selection methods.
To aid the fight against this lack of accountability in social networks, Rocha et al. (2017) discussed a general framework, which has the advantage of being scalable to a high number of suspects.It is composed of training and testing stages.
In the training stage, messages associated with suspects' accounts are collected from social media and pre-processed in order to remove sparse features, such as numbers, dates, times, URLs, very short messages with only a few characters, and non-English messages, which enforces the consistency required in the subsequent feature extraction step.All features are then combined into feature sets based on the common bag-of-words models (Salton and McGill 1986).The authors implemented different strategies for this step: character-level n-grams, word-level n-grams, part-of-speech n-grams, and diverse lexical and syntactic statistics as features.This form of characterization captures stylistic features of an author (for instance, a tendency for using capital letters over lowercase ones), patterns of use of Emojis and other social media conventions, as well as vocabulary richness and user-specific grammar constructions.The feature sets are used to train a classifier, such as Power Mean SVM (Wu 2012), W-SVM (Scheirer et al. 2014), Random Forests (Breiman 1996), SCAP (Frantzeskou et al. 2007), and compression-based attribution (Teahan and Harper 2003).
The test stage starts with a message of unknown authorship, which proceeds through the exact same feature extraction process as the training messages.The resulting feature vector is submitted to the pre-trained classifiers, which produces a prediction of its authorship.This result points out the most probable suspect from a set of possible ones.Although it represents an important step towards understanding the difficult problem of authorship attribution for very short messages in social networks, this work also highlights the necessity of developing more informative features capable of capturing stylistic nuances of each person in order to achieve a better classification.Figure 7 depicts an overview of the method.

MULTIMEDIA PHYLOGENY: UNDERSTADING THE INTERPLAY OF DIGITAL OBJECTS
The efficient techniques presented thus far can be applied to specific forgery cases.Taking a wider perspective, more complicated situations can be easily found, involving several doctored images, in which the original (source) image is replicated and a set of transformations is applied to generate new images.Although all images share common characteristics, they might transmit a completely different message.Considering our highly-connected world through social networks and the universal language of images and videos, visual content can go viral into a worldwide scale very rapidly.Understanding the relationship of digital objects and their interplay is at the core of Multimedia Phylogeny, a research area focused on understanding the history and evolution over time of digital objects as a group rather than whether or not isolated objects are authentic.(Christensen 2016); and the 2011 Benetton's online Unhate ad campaign (Pownall 2015).Still, the list goes on and on and this is just the tip of the iceberg.
Notwithstanding the fact that the recognition of exactly duplicated images is straightforward, the identification of semantically-similar images (when transformations are applied) and their compositions is a challenging task.Moreover, identifying the source image and the relationship among all images under investigation are paramount to support digital forensic analysis.To aid in this complex analysis, research groups have been developing several methods (Dias et al. 2013b, Melloni et  2014, Oikawa et al. 2016, Costa et al. 2016) for multimedia phylogeny and provenance integrity over the years.
Inspired by the biological process of characteristics inheritance, multimedia phylogeny aims at identifying the relationship between a set of near-duplicate or semantically-similar images.Relationships are mapped as an image phylogeny tree (IPT) or forest (IPF), enabling the identification of the temporal sequence of modifications based on an image ancestral lineage and descendants.The IPTs/IPFs are presented by directed acyclic graphs, where arcs are created and weighted using a dissimilarity function.
In the method proposed by Dias et al. (2013a), the first step is to identify the images relationship pixel-by-pixel.Relevant points are matched between two images (ancestral/descendant) using SURF (Bay et al. 2008) and RANSAC (Fischler and Bolles 1987) methods.Normalization and compressing techniques are applied, and a pixel-wise comparison is carried out based on an homography matrix and used as a minimum squared error metric.Figure 9 shows an example of this process, which results into the dissimilarity matrix of n near-duplicate images.Based on this matrix, a modified minimum spanning tree algorithm (Dias et al. 2012) constructs the IPT as a second step.The process starts with a n-root forest and sorts An Acad Bras Cienc (2019) 91(Suppl.1) e20180149 13 | 19 digital images, but the ideas also apply to videos and text.We devise methods for dealing with partially constructed dissimilarity matrices, which can actively request new entries on the fly.This means that we can start the procedure by calculating only a subset of the entries of the dissimilarity matrix and the methods we propose will take care of finding additional entries while optimizing the search in order to use as few entries as possible.We perform experiments with more than 1 million test cases and show that our solutions represent a step forward in efficiently and effectively determining ancestral relationships of digital images.

Image Phylogeny Formalization
Finding the structure of relationships in a set of near-duplicate images normally requires two steps: a dissimilarity function d responsible for calculating how different each pair of images is and how likely it is they are parent and child on the tree and a tree reconstruction algorithm that operates on this matrix.An image phylogeny final product is an image phylogeny tree (IPT), which connects images according to their ancestral/descendant relations. 8Formally, let T be an image transformation from a family T .We can devise a dissimilarity function between two images I A and I B as the minimum for all possible values of that parameterizes T .Equation 1 measures the amount of residual between the best transformation of I A to I B , according to the family of operations T and We can perform such a comparison using any point-wise method L, such as pixel-wise minimum squared error.With a set of n near-duplicate images, the first task for creating an IPT is to calculate the dissimilarity between every pair of such images.For that, we need a reasonable set of possible image transformations, T , from which one image can generate an offspring. 8,11  Building an Image Phylogeny Tree Dias and his colleagues proposed an approach for calculating image dissimilarities and finding IPTs. 8For the dissimilarity calculations, the authors define these steps: 1. Find interest points in each pair of images (using SURF 13 ) to estimate warping and cropping parameters robustly using RANSAC. 142. Calculate pixel color normalization parameters by mapping the color channels of one image onto the color channels of the other image.
3. Compress one of the images with the same compression parameters as the other.
4. Uncompress both images and compare them pixel-by-pixel according to the minimum squared error metric.
Figure 1 depicts the process.The last step consists of finding the actual phylogeny tree associated with the calculated dissimilarity matrix.The Oriented-Kruskal algorithm, 8 for example, finds the root of the tree and builds the Image I 1 Image I 2 Mapping I 2 S I 1 igure 1. Image phylogeny tree construction process.To calculate image dissimilarities between a pair of images I 1 and I 2 , we find bust points of interest in both images, and for those which are good matches (yellow stars), we calculate an homography matrix presenting the necessary parameters to transform one image to another's domain.Once we perform the mapping, we can compare oth images pixel by pixel within the region of interest they overlap.Then, an homography matrix is calculated to enable pixel-by-pixel comparison and, therefore, the dissimilarity matrix.Adapted from (Dias et al. 2013a).

IEEE MultiMedia
the dissimilarity matrix elements based on their computed dissimilarity values.Different trees are joined according to their sorted order.
In real-world setups, the complete set of near-duplicates is often not available, forcing the technique to deal with missing nodes.In addition, multiple source images can exist, particularly in splicing/composition cases.In this scenario, a forest of IPTs needs to be constructed.
The multiple trees approach is also applicable for analyzing evidence involving image montages, blending, or a combination of different camera viewpoints.Such extensions are called multiple parenting phylogeny (Oikawa et al. 2016).Going back to "The Situation Room" example, Figure 10 shows how the proposed IPT helps in understanding the relationship between the related images collected from the Internet and their process of evolution from the very original photograph taken by White House photographer Pete Souza.

WHAT COMES NEXT IN TERMS OF UNDERSTADING REAL-WORLD EVENTS
Criminal activities evolve and adapt quickly, being fake news particularly on the spotlight, evincing the necessity of effective tools to help us answer the four most important aspects of an event: "who", "in what circumstances", "why", and "how".The covered techniques in this article are pinnacle examples of research aiming at answering one or more of these questions.However, a much richer "bird's eye view" of an event is pivotal to fully understand the nuances and details of an event.
There are a few researches currently focusing on representing and understanding real-world events as a whole, from media content in which is immersed a sea of fake news.Two projects -DéjàVu (Rocha 2017) and Forensic Architecture (Weizman et al. 2014) -can be highlighted, due to their heterogeneity regarding the representation and tools considered to tackle the problem.
The recently launched DéjàVu project (Rocha 2017) focuses on synchronizing, in space and time, all multimedia information collected from a target event, enabling fact-checking and mining persons, objects, and contents of interest.This process of synchronization is referred to as X-coherence.Such multimedia information may come from varying heterogeneous sources, such as social media, the Internet, and surveillance cameras.The synchronization allows us to better understand an event by virtually reconstructing it -the before, during, and aftermath.Once we can move through the reconstructed event, we have a higher chance of answering the important forensic questions mentioned before, likely providing irrefutable evidence to what really happened.M. A. Oikawa al. / Intl. Trans. in Op. Res. 23 (2016)

Datasets and source code
All test cases and the source code of the methods developed in multimedia phylogeny are fr available.The datasets registered at http://dx.doi.org/10.6084/m9.figshare.1012816,and also be downloaded at http://www.recod.ic.unicamp.br/ßoikawa/datasets.html.The source c is available in a public repository at http://repo.recod.ic.unicamp.br/public/projects.

Open issues
In multimedia phylogeny, most of the work has been mainly developed for images, but other ty of media are equally important and also create different phylogenetic structures.Especially, so important issues to be solved are related to video phylogeny.Not rarely, videos are created f the combinations of several shots from other videos, which makes the implementation of mult parenting phylogeny for videos an important problem to be explored.In addition, for a be performance, instead of evaluating the distances using the entire sequence of a set of video fi

Datasets and source code
All test cases and the source code of the methods developed in multimedia phylogeny are freely available.The datasets are registered at http://dx.doi.org/10.6084/m9.figshare.1012816,and can also be downloaded at http://www.recod.ic.unicamp.br/ßoikawa/datasets.html.The source code is available in a public repository at http://repo.recod.ic.unicamp.br/public/projects.

Open issues
In multimedia phylogeny, most of the work has been mainly developed for images, but other types of media are equally important and also create different phylogenetic structures.Especially, some important issues to be solved are related to video phylogeny.Not rarely, videos are created from the combinations of several shots from other videos, which makes the implementation of multiple parenting phylogeny for videos an important problem to be explored.In addition, for a better performance, instead of evaluating the distances using the entire sequence of a set of video files,  Oikawa et al. / Intl. Trans. in Op. Res. 23 (2016) 921-946 Fig. 12. Reconstruction of the image phylogeny tree of images collected from the Internet portraying The Situation Room episode.This IPT was reconstructed using E-AOB method.

Datasets and source code
All test cases and the source code of the methods developed in multimedia phylogeny are freely available.The datasets are registered at http://dx.doi.org/10.6084/m9.figshare.1012816,and can also be downloaded at http://www.recod.ic.unicamp.br/ßoikawa/datasets.html.The source code is available in a public repository at http://repo.recod.ic.unicamp.br/public/projects.

Open issues
In multimedia phylogeny, most of the work has been mainly developed for images, but other types of media are equally important and also create different phylogenetic structures.Especially, some important issues to be solved are related to video phylogeny.Not rarely, videos are created from the combinations of several shots from other videos, which makes the implementation of multiple parenting phylogeny for videos an important problem to be explored.In addition, for a better performance, instead of evaluating the distances using the entire sequence of a set of video files,

T~ (D)
Fig. 12. Reconstruction of the image phylogeny tree of images collected from the Internet portraying The Situation Room episode.This IPT was reconstructed using E-AOB method.

Datasets and source code
All test cases and the source code of the methods developed in multimedia phylogeny are freely available.The datasets are registered at http://dx.doi.org/10.6084/m9.figshare.1012816,and can also be downloaded at http://www.recod.ic.unicamp.br/ßoikawa/datasets.html.The source code is available in a public repository at http://repo.recod.ic.unicamp.br/public/projects.

Open issues
In multimedia phylogeny, most of the work has been mainly developed for images, but other types of media are equally important and also create different phylogenetic structures.Especially, some important issues to be solved related to video phylogeny.Not rarely, videos are created from the combinations of several shots from other videos, which makes the implementation of multiple parenting phylogeny for videos an important problem to be explored.In addition, for a better performance, instead of evaluating the distances using the entire sequence of a set of video files,  Oikawa et al. / Intl. Trans. in Op. Res. 23 (2016) 921-946 Fig. 12. Reconstruction of the image phylogeny tree of images collected from the Internet portraying The Situation Room episode.This IPT was reconstructed using E-AOB method.

Datasets and source code
All test cases and the source code of the methods developed in multimedia phylogeny are freely available.The datasets are registered at http://dx.doi.org/10.6084/m9.figshare.1012816,and can also be downloaded at http://www.recod.ic.unicamp.br/ßoikawa/datasets.html.The source code is available in a public repository at http://repo.recod.ic.unicamp.br/public/projects.

Open issues
In multimedia phylogeny, most of the work has been mainly developed for images, but other types of media are equally important and also create different phylogenetic structures.Especially, some important issues to be solved are related to video phylogeny.Not rarely, videos are created from the combinations of several shots from other videos, which makes the implementation of multiple parenting phylogeny for videos an important problem to be explored.In addition, for a better performance, instead of evaluating the distances using the entire sequence of a set of video files,

Datasets and source code
All test cases and the source code of the methods developed in multimedia phylogeny are freely available.The datasets are registered at http://dx.doi.org/10.6084/m9.figshare.1012816,and can also be downloaded at http://www.recod.ic.unicamp.br/ßoikawa/datasets.html.The source code is available in a public repository at http://repo.recod.ic.unicamp.br/public/projects.

Open issues
In multimedia phylogeny, most of the work has been mainly developed for images, but other types of media are equally important and also create different phylogenetic structures.Especially, some important issues to be solved are related to video phylogeny.Not rarely, videos are created from the combinations of several shots from other videos, which makes the implementation of multiple parenting phylogeny for videos an important problem to be explored.In addition, for a better performance, instead of evaluating the distances using the entire sequence of a set of video files,

Datasets and source code
All test cases and the source code of the methods developed in multimedia phylogeny are freely available.The datasets are registered at http://dx.doi.org/10.6084/m9.figshare.1012816,and can also be downloaded at http://www.recod.ic.unicamp.br/ßoikawa/datasets.html.The source code is available in a public repository at http://repo.recod.ic.unicamp.br/public/projects.

Open issues
In multimedia phylogeny, most of the work has been mainly developed for images, but other types of media are equally important and also create different phylogenetic structures.Especially, some important issues to be solved are related to video phylogeny.Not rarely, videos are created from the combinations of several shots from other videos, which makes the implementation of multiple parenting phylogeny for videos an important problem to be explored.In addition, for a better performance, instead of evaluating the distances using the entire sequence of a set of video files,

Datasets and source code
All test cases and the source code of the methods developed in multimedia phylogeny are freely available.The datasets are registered at http://dx.doi.org/10.6084/m9.figshare.1012816,and can also be downloaded at http://www.recod.ic.unicamp.br/ßoikawa/datasets.html.The source code is available in a public repository at http://repo.recod.ic.unicamp.br/public/projects.

Open issues
In multimedia phylogeny, most of the work has been mainly developed for images, but other types of media are equally important and also create different phylogenetic structures.Especially, some important issues to be solved are related to video phylogeny.Not rarely, videos are created from the combinations of several shots from other videos, which makes the implementation of multiple parenting phylogeny for videos an important problem to be explored.In addition, for a better performance, instead of evaluating the distances using the entire sequence of a set of video files,  Oikawa et al. / Intl. Trans. in Op. Res. 23 (2016) 921-946 nstruction of the image phylogeny tree of images collected from the Internet portraying The Situation Room episode.This IPT was reconstructed using E-AOB method.
and source code and the source code of the methods developed in multimedia phylogeny are freely datasets are registered at http://dx.doi.org/10.6084/m9.figshare.1012816,and can loaded at http://www.recod.ic.unicamp.br/ßoikawa/datasets.html.The source code a public repository at http://repo.recod.ic.unicamp.br/public/projects. s a phylogeny, most of the work has been mainly developed for images, but other types equally important and also create different phylogenetic structures.Especially, some ues to be solved are related to video phylogeny.Not rarely, videos are created from ions of several shots from other videos, which makes the implementation of multiple ylogeny for videos an important problem to be explored.In addition, for a better instead of evaluating the distances using the entire sequence of a set of video files, ors.nsactions in Operational Research C ⃝ 2015 International Federation of Operational Research Societies 942 M. A. Oikawa et al. / Intl. Trans. in Op. Res. 23 (2016) 921-946 Fig. 12. Reconstruction of the image phylogeny tree of images collected from the Internet portraying The Situation Room episode.This IPT was reconstructed using E-AOB method.

Datasets and source code
All test cases and the source code of the methods developed in multimedia phylogeny are freely available.The datasets are registered at http://dx.doi.org/10.6084/m9.figshare.1012816,and can also be downloaded at http://www.recod.ic.unicamp.br/ßoikawa/datasets.html.The source code is available in a public repository at http://repo.recod.ic.unicamp.br/public/projects.

Open issues
In multimedia phylogeny, most of the work has been mainly developed for images, but other types of media are equally important and also create different phylogenetic structures.Especially, some important issues to be solved are related to video phylogeny.Not rarely, videos are created from the combinations of several shots from other videos, which makes the implementation of multiple parenting phylogeny for videos an important problem to be explored.In addition, for a better performance, instead of evaluating the distances using the entire sequence of a set of video files, With the X-coherence, which refers to feature-space-time coherence, it is possible to find physical (e.g., where something happened), temporal (e.g., when it took place), and feature (e.g., creating a transformed and unified feature space so to allow content discovery and pattern understanding) relations.It can be seen as a natural evolution of the traditional multimedia phylogeny solutions, in particular of the multiple parenting phylogeny and the ultimate integration of all forensic analysis pieces.
In order to hint at the X-coherence strength, Lameri et al. (2014) analyzed a pool of videos related to specific events, first focusing on the reconstruction of longer parent sequences describing the event itself.By mainly focusing on the Boston Marathon Bombing event (NEW YORK TIMES 2013), where two bombs went off near the marathon finish line, the proposed technique was capable of reconstructing longer sequences of videos (at times complementing the smaller ones) associated with the event.Considering the social impact of this event and, therefore, the flood of data produced by social and mainstream media, it was possible to provide the right chronological sequence, joining assorted multimedia materials in order to support, among other aspects, suspect identification and event understanding, which is one of the goals of the X-coherence synchronization. 3  Forensic Architecture (Weizman et al. 2014), on the other hand, is a project and multidisciplinary research group which considers architectural techniques to investigate cases of human rights violation around the world.It aims at producing and presenting architectural evidence in contemporary conflicts.By analyzing shared media, they are able to model dynamic events as they unfold in space and time, by creating navigable 3D models and interactive cartographies of sites of conflicts.These techniques allows the presentation of the events 3 A video demonstrating the technique is available at http://tinyurl.com/q6fslbjand more information about the project can be found at http://dejavu.ic.unicamp.br/.

Figure 1 -Figure 2 -
Figure 1 -Simplified scheme of the image forgery process: (a) image splicing creation, and (b) copy-paste image creation.

Figure 3 -
Figure 3 -Example of image splicing, which circulated in social media in 2017.The left figure depicts an image composition while the right one depicts the host image used to construct the fake one.Source: Gizmodo (Novak 2017b).

Figure 4 -
Figure 4 -Overview of the method proposed by Carvalho et al. (2016).An image is analyzed by segmenting all suspected faces.Each face is represented in the transformed illuminant space and further characterized through some image description methods such as the ones involving patterns of texture, color, and shape.Then, the descriptors for all pairs of faces in the image are concatenated.Ultimately, a classifier is trained based on the concatenated feature vectors to issue an authenticity decision upon receiving each pair of faces.

Figure 5 -
Figure5-The Iranian Missile Case(Shachtman 2008, Nizza andLyons 2008).Patterns of a successful missile launch were replicated to the failing one, a clear case of copy-pasting portions of an image so as to change its meaning.Illustration by The New York Times; photo via Agence France-Presse.
ple of filters weights for the first convolutional layer operating put image pixels.Weight values are mapped in grayscale.olutional output of the first layer of the trained network, given an om an investigated printer.For each filter, different areas inside borders are highlighted.his is the first deep network custom-tailored to the ibution problem.The used CNN architecture has ng layers: input layer, where the raw image or a different sentation (median filter residual or average filter ual) is used.It requires 28 × 28 images as input.
001 and a weight decay of 0.0005 without e used a batch size (subsampling of image examples ne forward/backward pass through the network) ages batch normalization.The number of ochs, which is the number of one forward and one pass of all training examples through the network 30, and the model generated at the epoch with the lidation loss (20 epochs) was selected.and 7 depict the 20 5 × 5 filters of the first

Fig. 8 .
Fig. 8. Proposed multiple representations of different data for laser printer attribution through a set of lightweight Convolutional Neural Networks.Early and late fusion steps are highlighted in blue and green, respectively.
Fig 8: 1) Early fusion -multiple representations of the same data: we apply three different networks on input characters (of one type) coming from S raw char , S med char and S avg char .We concatenate the generated feature vectors f raw char , f med char and f avg char into a single vector f char in an early-fusion fashion [44].This vector is fed to a set of linear SVMs used with a One-vs-One classification policy [45] to classify each character separately assigning a label l print char to each one of them.The rationale for using this technique is that different representations highlight complementary artifacts.2) Late fusion -multiple representations of different data: after taking decisions at the character level withinal.: DATA-DRIVEN FEATURE CHARACTERIZATION TECHNIQUES FOR LASER PRINTER ATTRIBUTION 1865 ple of filters weights for the first convolutional layer operating put image pixels.Weight values are mapped in grayscale.olutional output of the first layer of the trained network, given an om an investigated printer.For each filter, different areas inside borders are highlighted.his is the first deep network custom-tailored to the ibution problem.The used CNN architecture has ng layers: input layer, where the raw image or a different sentation (median filter residual or average filter ual) is used.It requires 28 × 28 images as input.
001 and a weight decay of 0.0005 without e used a batch size (subsampling of image examples ne forward/backward pass through the network) ages without batch normalization.The number of ochs, which is the number of one forward and one pass of all training examples through the network

Fig. 8 .
Fig. 8. Proposed multiple representations of different data for laser printer attribution through a set of lightweight Convolutional Neural Networks.Early and late fusion steps are highlighted in blue and green, respectively.
[17].To compensate for this issue, we propose to use two lightweight fusion methods depicted in Fig 8: 1) Early fusion -multiple representations of the same data: we apply three different networks on input characters (of one type) coming from S raw char , S med char and S avg char .We concatenate the generated feature vectors f raw char , f med char and f avg char into a single vector f char in an early-fusion fashion [44].This vector is fed to a set of linear SVMs used with a One-vs-One classification policy [45] to classify each character separately assigning a label l print char to each one of them.The rationale for using this technique is that different representations highlight al.: DATA-DRIVEN FEATURE CHARACTERIZATION TECHNIQUES FOR LASER PRINTER ATTRIBUTION 1865 ple of filters weights for the first convolutional layer operating put image pixels.Weight values are mapped in grayscale.olutional output of the first layer of the trained network, given an om an investigated printer.For each filter, different areas inside borders are highlighted.his is the first deep network custom-tailored to the ibution problem.The used CNN architecture has ng layers: input layer, where the raw image or a different sentation (median filter residual or average filter ual) is used.It requires 28 × 28 images as input.

Fig. 8 .
Fig. 8. Proposed multiple representations of different data for laser printer attribution through a set of lightweight Convolutional Neural Networks.Early and late fusion steps are highlighted in blue and green, respectively.
[17].To compensate for this issue, we propose to use two lightweight fusion methods depicted in Fig 8: 1) Early fusion -multiple representations of the same data: we apply three different networks on input characters (of one type) coming from S raw char , S med char and S avg char .We concatenate the generated feature vectors f raw char , f med char and f avg char into a single vector f char in an early-fusion fashion [44].This vector is fed to a set of linear SVMs used with a One-vs-One classification policy [45] to classify each character separately assigning a label al.: DATA-DRIVEN FEATURE CHARACTERIZATION TECHNIQUES FOR LASER PRINTER ATTRIBUTION 1865 ple of filters weights for the first convolutional layer operating put image pixels.Weight values are mapped in grayscale.olutional output of the first layer of the trained network, given an om an investigated printer.For each filter, different areas inside borders are highlighted.his is the first deep network custom-tailored to the ibution problem.The used CNN architecture has ng layers: input layer, where the raw image or a different sentation (median filter residual or average filter ual) is used.It requires 28 × 28 images as input.

Fig. 8 .
Fig. 8. Proposed multiple representations of different data for laser printer attribution through a set of lightweight Convolutional Neural Networks.Early and late fusion steps are highlighted in blue and green, respectively.
[17].To compensate for this issue, we propose to use two lightweight fusion methods depicted in Fig 8: 1) Early fusion -multiple representations of the same data: we apply three different networks on input characters (of one type) coming from S raw char , S med char and S avg char .al.: DATA-DRIVEN FEATURE CHARACTERIZATION TECHNIQUES FOR LASER PRINTER ATTRIBUTION 1865 ple of filters weights for the first convolutional layer operating put image pixels.Weight values are mapped in grayscale.olutional output of the first layer of the trained network, given an om an investigated printer.For each filter, different areas inside borders are highlighted.his is the first deep network custom-tailored to the ibution problem.The used CNN architecture has ng layers: input layer, where the raw image or a different sentation (median filter residual or average filter ual) is used.It requires 28 × 28 images as input.

Fig. 8 .
Fig. 8. Proposed multiple representations of different data for laser printer attribution through a set of lightweight Convolutional Neural Networks.Early and late fusion steps are highlighted in blue and green, respectively.
al.: DATA-DRIVEN FEATURE CHARACTERIZATION TECHNIQUES FOR LASER PRINTER ATTRIBUTION 1865 ple of filters weights for the first convolutional layer operating put image pixels.Weight values are mapped in grayscale.olutional output of the first layer of the trained network, given an om an investigated printer.For each filter, different areas inside borders are highlighted.his is the first deep network custom-tailored to the ibution problem.The used CNN architecture has ng layers: input layer, where the raw image or a different sentation (median filter residual or average filter ual) is used.It requires 28 × 28 images as input.

Fig. 8 .
Fig. 8. Proposed multiple representations of different data for laser printer attribution through a set of lightweight Convolutional Neural Networks.Early and late fusion steps are highlighted in blue and green, respectively.

…
ARACTERIZATION TECHNIQUES FOR LASER PRINTER ATTRIBUTION rst convolutional layer operating s are mapped in grayscale.r of the trained network, given an each filter, different areas inside ork custom-tailored to the ed CNN architecture has raw image or a different residual or average filter 8 × 28 images as input.s made of 20 5 × 5 filters lapping max pooling layer r, with 50 filters of size ther non-overlapping max d stride 2. generates a vector ∈ R 500 .is non-linearly processed element-wise.

Fig. 8 .
Fig. 8. Proposed multiple representations of different d attribution through a set of lightweight Convolutional Neu and late fusion steps are highlighted in blue and green, r

Fig. 8 .
Fig. 8. Proposed multiple representations of different d attribution through a set of lightweight Convolutional Neu and late fusion steps are highlighted in blue and green, r

Figure 6 -
Figure 6 -Document attribution pipeline overview.Adapted from(Ferreira et al. 2017).A suspected document is analyzed through a series of non-linear transformations by means of convolutional neural networks in order to extract distinctive attribution patterns.Ultimately, all features are combined for decision-making.

Figure 7 -
Figure 7 -Overview of the method proposed by Rocha et al. (2017) for authorship attribution of very short messages online.Adapted from (Rocha et al. 2017).

Figure 8 -
Figure 8 -"The Situation Room" photo.(a) Source image taken by the White House photographer Pete Souza; (b) A version produced by the Brooklyn-based Hasidic newspaper removing Secretary of State Hillary Clinton and another woman from the photo.Source: (HUFFPOST 2011); and (c) A meme depicting the politicians as super heroes.Source: (ENews 2012).

Figure 9 -
Figure 9 -Image mapping process: to calculate the dissimilarity between I 1 and I 2 robust points of interest are computed.Then, an homography matrix is calculated to enable pixel-by-pixel comparison and, therefore, the dissimilarity matrix.Adapted from(Dias et al. 2013a).
Fig. 12. Reconstruction of the image phylogeny tree of images collected from the Intern Room episode.This IPT was reconstructed using E-AOB metho 6.1.Datasets and source code All test cases and the source code of the methods developed in multimed available.The datasets are registered at http://dx.doi.org/10.6084/m9.figalso be downloaded at http://www.recod.ic.unicamp.br/ßoikawa/dataset is available in a public repository at http://repo.recod.ic.unicamp.br/publ

C⃝
Fig. 12. Reconstruction of the image phylogeny tree of images collected from the Internet portraying The SituationRoom episode.This IPT was reconstructed using E-AOB method.

C⃝
Fig. 12. Reconstruction of the image phylogeny tree of images collected fromRoom episode.This IPT was reconstructed using E-A6.1.Datasets and source codeAll test cases and the source code of the methods developed in m available.The datasets are registered at http://dx.doi.org/10.608also be downloaded at http://www.recod.ic.unicamp.br/ßoikawa is available in a public repository at http://repo.recod.ic.unicamp.

C⃝
2015 The Authors.International Transactions in Operational Research C ⃝ 2015 International Federation of Operational Research Societies 942 M. A.

Fig. 12 .
Fig. 12. Reconstruction of the image phylogeny tree of images collected from the Internet portraying The SituationRoom episode.This IPT was reconstructed using E-AOB method.

C⃝
Fig. 12. Reconstruction of the image phylogeny tree of images collected from the Internet portraying The SituationRoom episode.This IPT was reconstructed using E-AOB method.

C⃝
Fig. 12. Reconstruction of the image phylogeny tree of images collected from the Internet portraying The SituationRoom episode.This IPT was reconstructed using E-AOB method.

C⃝E
2015 The Authors.International Transactions in Operational Research C ⃝ 2015 International Federation of Operational Research Societies D Oikawa et al. / Intl.Trans. in Op.Res.23 (2016) 921-946 age phylogeny tree of images collected from the Internet portraying The Situation sode.This IPT was reconstructed using E-AOB method.code of the methods developed in multimedia phylogeny are freely istered at http://dx.doi.org/10.6084/m9.figshare.1012816,and can www.recod.ic.unicamp.br/ßoikawa/datasets.html.The source code ry at http://repo.recod.ic.unicamp.br/public/projects. t of the work has been mainly developed for images, but other types nt and also create different phylogenetic structures.Especially, some are related to video phylogeny.Not rarely, videos are created from ots from other videos, which makes the implementation of multiple os an important problem to be explored.In addition, for a better ating the distances using the entire sequence of a set of video files, nal Research C ⃝ 2015 International Federation of Operational Research Societies F M. A.

Figure 10 -
Figure 10 -IPT construction example using images collected from the Internet about "The Situation Room" episode.Adapted from (Oikawa et al. 2016).
can be properly applied in this case.It explores image transformed spaces to capture artifacts and pinpoint An Acad BrasCienc (2019) 91(Suppl.1) e20180149 4 | 19

Collect suspects' old messages Social Network ? Anonymous Message Training Data Messages Pre-processing Classifier Pre-processing and BoW Feature Generation Probable Suspect
An Acad Bras Cienc (2019) 91(Suppl.1) e20180149 11 | 19