Distributed video coding (DVC) has attracted a lot of attention during the past decade as a new solution for video compression where the computationally most intensive operations are performed by the decoder instead of by the encoder. One very important issue in many current DVC solutions is the use of a feedback channel from the decoder to the encoder for the purpose of determining the rate of the coded stream. The use of such a feedback channel is not only impractical in storage applications but even in streaming scenarios feedback-channel usage may result in intolerable delays due to the typically large number of requests for decoding one frame. Instead of reverting to a feedback-free solution by adding complexity to the encoder for performing encoder-side rate estimation, as an alternative, in previous work we proposed to incorporate constraints on feedback channel usage. To cope better with rate fluctuations caused by changing motion characteristics, in this paper we propose a refined approach exploiting information available from already decoded frames at other temporal layers. The results indicate significant improvements for all test sequences (using a GOP of length four).
To achieve the high coding efficiency the H.264/AVC standard offers, the encoding process quickly becomes computationally demanding. One of the most intensive encoding phases is motion estimation. Even modern CPUs struggle to process high-definition video sequences in real-time. While personal computers are typically equipped with powerful Graphics Processing Units (GPUs) to accelerate graphics operations, these GPUs lie dormant when encoding a video sequence. Furthermore, recent developments show more and more computer configurations come with multiple GPUs. However, no existing GPU-enabled motion estimation architectures target multiple GPUs. In addition, these architectures provide no early-out behavior nor can they enforce a specific processing order. We developed a motion search architecture, capable of executing motion estimation and partitioning for an H.264/AVC sequence entirely on the GPU using the NVIDIA CUDA (Compute Unified Device Architecture) platform. This paper describes our architecture and presents a novel job scheduling system we designed, making it possible to control the GPU in a flexible way. This job scheduling system can enforce real-time demands of the video encoder by prioritizing calculations and providing an early-out mode. Furthermore, the job scheduling system allows the use of multiple GPUs in one computer system and efficient load balancing of the motion search over these GPUs. This paper focuses on the execution speed of the novel job scheduling system on both single and multi-GPU systems. Initial results show that real-time full motion search of 720p high-definition content is possible with a 32 by 32 search window running on a system with four GPUs.
Background subtraction is a method commonly used to segment objects of interest in image sequences. By comparing new frames to a background model, regions of interest can be found. To cope with highly dynamic and complex environments, a mixture of several models has been proposed in the literature. We propose a novel background subtraction technique derived from the popular mixture of Gaussian models technique (MGM). We discard the Gaussian assumptions and use models existing of an average and an upper and lower threshold. Additionally, we include a maximum difference with the previous value and present an intensity allowance to cope with gradual lighting changes and photon noise, respectively. Moreover, edge-based image segmentation is introduced to improve the results of the proposed technique. This combination of temporal and spatial information results in a robust object detection technique that deals with several difficult situations. Experimental analysis shows that our system is more robust than MGM and more recent techniques, resulting in less false positives and negatives. Finally, a comparison of processing speed shows that our system can process frames up to 50% faster.
KEYWORDS: Video, Computer programming, Visualization, Detection and tracking algorithms, Video coding, Quantization, Video compression, Image processing, Video processing, Computer architecture
Nowadays, most video material is coded using a non-scalable format. When transmitting these single-layer video bitstreams, there may be a problem for connection links with limited capacity. In order to solve this problem, requantization transcoding is often used. The requantization transcoder applies coarser quantization in order
to reduce the amount of residual information in the compressed video bitstream. In this paper, we extend a requantization transcoder for H.264/AVC video bitstreams with a rate-control algorithm. A simple algorithm is proposed which limits the computational complexity. The bit allocation is based on the bit distribution in the original video bitstream. Using the bit budget and a linear model between rate and quantizer, the new quantizer is calculated. The target bit rate is attained with an average deviation lower than 6%, while the rate-distortion performance shows small improvements over transcoding without rate control.
KEYWORDS: Motion estimation, Computer programming, Video coding, Video, Computing systems, Video surveillance, Quantization, Video compression, Switches, Linear filtering
Distributed video coding is a new video coding paradigm that shifts the computational intensive motion estimation
from encoder to decoder. This results in a lightweight encoder and a complex decoder, as opposed
to the predictive video coding scheme (e.g., MPEG-X and H.26X) with a complex encoder and a lightweight
decoder. Both schemas, however, do not have the ability to adapt to varying complexity constraints imposed by
encoder and decoder, which is an essential ability for applications targeting a wide range of devices with different
complexity constraints or applications with temporary variable complexity constraints. Moreover, the effect of
complexity adaptation on the overall compression performance is of great importance and has not yet been investigated.
To address this need, we have developed a video coding system with the possibility to adapt itself to
complexity constraints by dynamically sharing the motion estimation computations between both components.
On this system we have studied the effect of the complexity distribution on the compression performance.
This paper describes how motion estimation can be shared using heuristic dynamic complexity and how
distribution of complexity affects the overall compression performance of the system. The results show that the
complexity can indeed be shared between encoder and decoder in an efficient way at acceptable rate-distortion performance.
Detection and segmentation of objects of interest in image sequences is the first major processing step in visual
surveillance applications. The outcome is used for further processing, such as object tracking, interpretation,
and classification of objects and their trajectories. To speed up the algorithms for moving object detection,
many applications use techniques such as frame rate reduction. However, temporal consistency is an important
feature in the analysis of surveillance video, especially for tracking objects. Another technique is the downscaling
of the images before analysis, after which the images are up-sampled to regain the original size. This method,
however, increases the effect of false detections. We propose a different pre-processing step in which we use a
checkerboard-like mask to decide which pixels to process. For each frame the mask is inverted to avoid that
certain pixel positions are never analyzed. In a post-processing step we use spatial interpolation to predict the
detection results for the pixels which were not analyzed. To evaluate our system we have combined it with a
background subtraction technique based on a mixture of Gaussian models. Results show that the models do not
get corrupted by using our mask and we can reduce the processing time with over 45% while achieving similar
detection results as the conventional technique.
In this paper, two systems for low-complexity MPEG-2 to H.264 transcoding are presented. Both approaches reuse the
MPEG-2 motion information in order to avoid computationally expensive H.264 motion estimation. In the first approach,
inter- and intra-coded macroblocks are treated separately. Since H.264 applies intra-prediction, while MPEG-2 does not,
intra-blocks are completely decoded and re-encoded. For inter-coded macroblocks, the MPEG-2 macroblock types and
motion vectors are first converted to their H.264 equivalents. Thereafter, the quantized DCT coefficients of the
prediction residuals are dequantized and translated to equivalent H.264 IT coefficients using a single-step DCT-to-IT
transform. The H.264 quantization of the IT coefficients is steered by a rate-control algorithm enforcing a constant bit-rate.
While this system is computationally very efficient, it suffers from encoder-decoder drift due to its open-loop
structure.
The second transcoding solution eliminates encoder-decoder drift by performing full MPEG-2 decoding followed by
rate-controlled H.264 encoding using the motion information present in the MPEG-2 source material. This closed-loop
solution additionally allows dyadic resolution scaling by performing downscaling after the MPEG-2 decoding and
appropriate MPEG-2 to H.264 macroblock type and motion vector conversion.
Experimental results show that, in terms of PSNR, the closed-loop transcoder significantly outperforms the open-loop
solution. The latter introduces drift, mainly as a result of the difference in sub-pixel interpolation between H.264 and
MPEG-2. Complexity-wise, the closed-loop transcoder requires on average 30 % more processing time than the openloop
system. The closed-loop transcoder is shown to deliver compression performance comparable to standard MPEG-2
encoding.
In this paper, a novel compressed-domain motion detection technique, operating on MPEG-2-encoded video, is
combined with H.264 flexible macroblock ordering (FMO) to achieve efficient, error-resilient MPEG-2 to H.264
transcoding. The proposed motion detection technique first extracts the motion information from the MPEG-2-encoded
bit-stream. Starting from this information, moving regions are detected using a region growing approach. The
macroblocks in these moving regions are subsequently encoded separately from those in background regions using FMO.
This can be used to increase error resilience and/or to realize additional bit-rate savings compared to traditional
transcoding.
KEYWORDS: Visualization, Video acceleration, Video, Computer programming, Video coding, Video processing, Computer architecture, Motion models, Standards development, 3D modeling
The coding efficiency of the H.264/AVC standard makes the decoding process computationally demanding. This has
limited the availability of cost-effective, high-performance solutions. Modern computers are typically equipped with
powerful yet cost-effective Graphics Processing Units (GPUs) to accelerate graphics operations. These GPUs can be
addressed by means of a 3-D graphics API such as Microsoft Direct3D or OpenGL, using programmable shaders as
generic processing units for vector data. The new CUDA (Compute Unified Device Architecture) platform of NVIDIA
provides a straightforward way to address the GPU directly, without the need for a 3-D graphics API in the middle. In
CUDA, a compiler generates executable code from C code with specific modifiers that determine the execution model.
This paper first presents an own-developed H.264/AVC renderer, which is capable of executing motion compensation
(MC), reconstruction, and Color Space Conversion (CSC) entirely on the GPU. To steer the GPU, Direct3D combined
with programmable pixel and vertex shaders is used. Next, we also present a GPU-enabled decoder utilizing the new
CUDA architecture from NVIDIA. This decoder performs MC, reconstruction, and CSC on the GPU as well. Our results
compare both GPU-enabled decoders, as well as a CPU-only decoder in terms of speed, complexity, and CPU
requirements. Our measurements show that a significant speedup is possible, relative to a CPU-only solution. As an
example, real-time playback of high-definition video (1080p) was achieved with our Direct3D and CUDA-based
H.264/AVC renderers.
With all the hype created around multimedia in the last few years, consumers expect to be able to access multimedia content in a real-time manner, anywhere and anytime. One of the problems with the real-time requirement is that transportation networks, such as the Internet, are still prone to errors. Due to real-time constraints, retransmission of lost data is, more often than not, not an option. Therefore, the study of error resilience and error concealment techniques is of the utmost importance since it can seriously limit the impact of a transmission error. In this paper an evaluation of a part of flexible macroblock ordering, one of the new error resilience techniques in H.264/AVC, is made by analyzing its costs and gains in an error-prone environment. This paper concentrates on the study of flexible macroblock ordering (FMO). More specifically a study of scattered slices, FMO type 1, is made. Our analysis shows that FMO type 1 is a good tool to introduce error robustness into an H.264/AVC bitstream as long as the QP is higher than 30. When the QP of the bitstream is below 30, the cost of FMO type 1 becomes a serious burden.
In order to be able to better cope with packet loss, H.264/AVC, besides offering superior coding efficiency, also comes with a number of error resilience tools. The goal of these tools is to enable the decoding of a bitstream containing encoded video, even when parts of it are missing. On top of that, the visual quality of the decoded video should remain as high as possible. In this paper, we will discuss and evaluate one of these tools, in particular the data partitioning tool. Experimental results will show that using data partitioning can significantly improve the quality of a video sequence when packet loss occurs. However, this is only possible if the channel used for transmitting the video allows selective protection of the different data partitions. In the most extreme case, an increase in PSNR of up to 9.77 dB can be achieved. This paper will also show that the overhead caused by using data partitioning is acceptable. In terms of bit rate, the overhead amounts to approximately 13 bytes per slice. In general, this is less than 1% of the total bit rate. On top of that, using constrained intra prediction, which is required to fully exploit data partitioning, causes a decrease in quality of about 0.5 dB for high quality video and between 1 and 2 dB for low quality video.
Reduction of the bitrate of video content is necessary in order to satisfy the different constraints imposed by networks and terminals. A fast and elegant solution for the reduction of the bitrate is requantization, which has been successfully applied on MPEG-2 bitstreams. Because of the improved intra prediction in the H.264/AVC specification, existing transcoding techniques are no longer suitable. In this paper we compare requantization transcoders for H.264/AVC bitstreams. The discussion is restricted to intra 4x4 macroblocks only, but the same techniques are also applicable to intra 16x16 macroblocks. Besides the open-loop transcoder and the transcoder with mode reuse, two architectures with drift compensation are described, one in the pixel domain and the other in the transform domain. Experimental results show that these architectures approach the quality of the full decode and recode architecture for low to medium bitrates. Because of the reduced computational complexity of these architectures, in particular the transform-domain compensation architecture, they are highly suitable for real-time adaptation of video content.
This paper gives an introduction to technologies and methodologies to measure performance of MPEG-21 applications in mobile environments. Since resources, such as processing time, available memory, storage, network, and battery time, are very sparse on mobile devices, it is important to optimize technologies to use as little as possible of those resources. To identify possible optimization points for MPEG-21 technologies, performance measurements technologies are applied on a prototype implementation of MPEG-21 Digital Item Declaration and Digital Item Processing. The upcoming MPEG-21 its goal is providing transparent and augmented use of multimedia resources across a plethora of networks and devices. The prototype, which has been implemented on the J2ME platform, gives information about possible bottlenecks when designing MPEG-21 based applications. The results of the measurements are discussed and used to identify which improvements need to be realized to reduce memory and processor consumption when implementing the discussed parts of the MPEG-21 standards on a mobile platform. This paper ends with a discussion and concluding remarks.
KEYWORDS: Video, Quantization, Video surveillance, Video coding, Computer programming, Multimedia, Cameras, Signal to noise ratio, Raster graphics, Video compression
H.264/AVC is the newest block based video coding standard from MPEG and VCEG. It not only provides superior and efficient video coding at various bit rates, it also has a "network-friendly" representation thanks to a series of new techniques which provide error robustness. Flexible Macroblock Ordering (FMO) is one of the new error resilience tools included in H.264/AVC. Here, we present an alternative use of flexible macroblock ordering, using its idea of combining non-neighboring macroblocks together in one slice. Instead of creating a scattered pattern, which is useful when transmitting the data over an error-prone network, we divide the picture into a number of regions of interest and one remaining region of disinterest. It is assumed that people watching the video will pay much more attention to the regions of interest than to the remainder of the video. So we compress the regions of interest at a higher bit rate than the regions of disinterest, thus lowering the overall bit rate. Simulations show that the overhead introduced by using rectangular regions of interest is minimal, while the bit rate can be reduced by 30% and more in most cases. Even at those reductions the video stays pleasant to watch. Transcoders can use this information as well by reducing only the quality of the regions of disinterest instead of the quality of the entire picture if applying SNR scalability. In extreme cases the regions of disinterest can even be dropped easily, thus reducing the overall bit rate even further.
Three low complexity algorithms that allow spatial scalability in the context of video coding are presented in this paper. We discussed the feasibility of reusing motion and residual texture information of the base layer in the enhancement layer. The prediction errors that arise from the discussed filters and schemes are evaluated in terms of the Mean of Absolute Differences. For the interpolation of the decoded pictures from the base layer, the presented 6-tap and bicubic filters perform significantly better than the bilinear and nearest neighbor filters. In contrast, when reusing the motion vector field and the error pictures of the base layer, the bilinear filter performs best for the interpolation of residual texture information. In general, reusing the motion vector field and the error pictures of the base layer gives the lowest prediction errors. However, our tests showed that for some sequences that have regions with complex motion activity, interpolating the decoded picture of the base layer gives best result. This means that an encoder should compare all possible prediction schemes combined with all interpolation filters in order to achieve optimal prediction. Obviously this would not be possible for real-time content creation.
H.264/AVC is a new specification for digital video coding that aims at a deployment in a lot of multimedia applications, such as video conferencing, digital television broadcasting, and internet streaming. This is for instance reflected by the design goals of the standard, which are about the provision of an efficient compression
scheme and a network-friendly representation of the compressed data. Those requirements have resulted in a very flexible syntax and architecture that is fundamentally different from previous standards for video compression. In this paper, a detailed discussion will be provided on how to apply an extended version of the MPEG-21 Bitstream Syntax Description Language (MPEG-21 BSDL) to the Annex B syntax of the H.264/AVC specification. This XML based language will facilitate the high-level manipulation of an H.264/AVC bitstream in
order to take into account the constraints and requirements of a particular usage environment. Our performance measurements and optimizations show that it is possible to make use of MPEG-21 BSDL in the context of the current H.264/AVC standard with a feasible computational complexity when exploiting temporal scalability.
The main goal of this work is to assess the overall imaging performance of dedicated new solid state devices compared to a traditional scintillation camera for use in SPECT imaging. A solid state detector with a rotating slat collimator will be compared with the same detector mounted with a classical collimator as opposed to a traditional Anger camera. A better energy resolution characterizes the solid state materials while the rotating slat collimator promises a better sensitivity-resolution tradeoff. The evaluation of the different imaging modalities is done using GATE, a recently developed Monte Carlo code. Several features for imaging performance evaluation were addressed: spatial resolution, energy resolution, sensitivity, and a ROC analysis was performed to evaluate the hot spot detectability. In this way a difference in perfromance was concluded for the diverse imaging techniques which allows a task dependent application of these modalities in future clinical practice.
KEYWORDS: Video, Computer programming, Video coding, Video compression, Visualization, Video surveillance, Motion estimation, Multimedia, Standards development, Data storage
Video coding is used under the hood of a lot of multimedia applications, such as video conferencing, digital storage media, television broadcasting, and internet streaming. Recently, new standards-based and proprietary technologies have emerged. An interesting problem is how to evaluate these different video coding solutions in terms of delivered quality.
In this paper, a PSNR-based approach is applied in order to compare the coding potential of H.264/AVC AHM 2.0 with the compression efficiency of XviD 0.9.1, DivX 5.05, Windows Media Video 9, and MC-EZBC. Our results show that MPEG-4-based tools, and in particular H.264/AVC, can keep step with proprietary solutions. The rate-distortion performance of MC-EZBC, a wavelet-based video codec, looks very promising too.
KEYWORDS: Scalable video coding, Receivers, Multimedia, Video, Signal to noise ratio, Internet, Video coding, Mathematical modeling, Data modeling, Surface plasmons
The increasing diversity of the characteristics of the terminals and networks that are used to access multimedia content through the internet introduces new challenges for the distribution of multimedia data. Scalable video coding will be one of the elementary solutions in this domain. This type of coding allows to adapt an encoded
video sequence to the limitations of the network or the receiving device by means of very basic operations. Algorithms for creating fully scalable video streams, in which multiple types of scalability are offered at the same time, are becoming mature. On the other hand, research on applications that use such bitstreams is only recently
emerging. In this paper, we introduce a mathematical model for describing such bitstreams. In addition, we show how we can model applications that use scalable bitstreams by means of definitions that are built on top of this model. In particular, we chose to describe a multicast protocol that is targeted at scalable bitstreams. This way, we will demonstrate that it is possible to define an abstract model for scalable bitstreams, that can be used as a tool for reasoning about such bitstreams and related applications.
KEYWORDS: Principal component analysis, Video, Video coding, Computer programming, Binary data, Statistical analysis, Data analysis, Chemical species, Profiling, Quantization
H.264/AVC is a video codec developed by the Joint Video Team (JVT); a cooperation between the ITU-T VCEG (Video Coding Experts Group) and ISO/IEC MPEG (Moving Picture Experts Group). This new video coding standard has some new features that allow to get significant improvements in coding efficiency. This improved coding efficiency leads to an overall more complex algorithm which has high demands regarding memory usage and processing power. Complexity, however, is an abstract concept and cannot be measured in a simple manner.
In this paper we present a method to obtain an accurate and more in-depth view on the internals of the JVT/AVC decoder. By decoding several bit streams having different encoding parameters, various program characteristics were measured. On these measurements, principal components analysis was performed to get a different view on these measurements. Our results show that the various encoding parameters have a clear impact on the low level behavior of the decoder. Moreover, our methodology allows us to give an explanation for the observed dissimilarities.
During recent years, the number of organizations making digital information available has massively increased. This evolution encouraged the development of standards for packaging and encoding digital representations of complex objects (such as a digital music albums or digitized books and photograph albums). The primary goal of this article is to offer a method to compare these packaging standards and best practices tailored to the needs of the digital library community and the rising digital preservation programs. The contribution of this paper is the definition of an integrated reference model, based on both the OAIS framework and some additional significant properties that affect the quality, usability, encoding and behavior of the digital objects.
KEYWORDS: Video, Multimedia, Standards development, Mobile devices, Software development, Video coding, Web services, Network architectures, Local area networks
While the price of mobile devices is dropping quickly, the set of features and capabilities of these devices is advancing very dramatically. Because of this, new mobile multimedia applications are conceivable, also thanks to the availability of high speed mobile networks like UMTS and Wireless LAN. However, creating such applications is still difficult due to the huge diversity of features and capabilities of mobile devices. Software developers also have to take into account the rigorous limitation on processing capabilities, display possibilities, and the limited battery life of these devices. On top of that, the availability of the device resources fluctuates strongly during execution of an application, directly and violently influencing the user experience, whereas equivalent fluctuations on traditional desktop PC's are far less prominent. Using new technology like MPEG-4, -7 and -21 can help application developers to overcome these problems. We have created an MPEG-21-based Video-on-Demand application optimized for mobile devices that is aware of the usage environment (i.e., user preference, device capabilities, device conditions, network status, etc.) of the client and adapts the MPEG-4 videos to it. The application is compliant with the Universal Multimedia Access framework, supports Time-Dependent Metadata, and relies on both MPEG-4 and MPEG-21 technology.
In this paper, we will describe a theoretical model of the spatial uncertainty for a line of response, due to the imperfect localization of events on the detector heads of the Positron Emission Tomography (PET) camera. We assume a Gaussian distribution of the position of interaction on a detector head, centered at the measured position. The probability that an event originates from a certain point in the FOV is calculated by integrating all the possible LORs through this point, weighted with the Gaussian probability of detection at the LORs end points. We have calculated these probabilities both for perpendicular and oblique coincidences. For the oblique coincidence case it was necessary to incorporate the effect of the crystal thickness in the calculations. We found that the probability function can not be analytically expressed in a closed form, and it was thus calculated by means of numerical integration. A Gaussian was fitted to the probability profiles for a given distance to the detectors. From these fits, we can conclude that the profiles can be accurately approximated by a Gaussian, both for perpendicular as for oblique coincidences. The FWHM reaches a maximum at the detector heads, and decreases towards the center of the FOV, as was expected.
The accurate quantification of brain perfusion for emission computed tomography data (PET-SPECT) is limited by partial volume effects (PVE). This study presents a new approach to estimate accurately the true tissue tracer activity within the grey matter tissue compartment. The methodology is based on the availability of additional anatomical side information and on the assumption that activity concentration within the white matter tissue compartment is constant. Starting from an initial estimate for the white matter grey matter activity, the true tracer activity within the grey matter tissue compartment is estimated by an alternating ML-EM-algorithm. During the updating step the constant activity concentration within the white matter compartment is modelled in the forward projection in order to reconstruct the true activity distribution within the grey matter tissue compartment, hence reducing partial volume averaging. Consequently the estimate for the constant activity in the white matter tissue compartment is updated based on the new estimated activity distribution in the grey matter tissue compartment. We have tested this methodology by means of computer simulations. A T1-weighted MR brainscan of a patient was segmented into white matter, grey matter and cerebrospinal fluid, using the segmentation package of the SPM-software (Statistical Parametric Mapping). The segmented grey and white matter were used to simulate a SPECT acquisition, modelling the noise and the distance dependant detector response. Scatter and attenuation were ignored. Following the above described strategy, simulations have shown it is possible to reconstruct the true activity distribution for the grey matter tissue compartment (activity/tissue volume), assuming constant activity in the white matter tissue compartment.
The current explosive expansion of mobile communication systems will lead to an increased demand for multimedia applications. However, due to the large variety of mobile terminals (such as mobile phones, laptops .) and, because of this, a wide collection of different terminal possibilities and terminal characteristics, it is difficult to create a mobile multimedia application which can be used on mobile devices of different types. In this paper, we propose a mobile multimedia application that adapts its content to the possibilities of the mobile terminal and to the end-user preferences. Also the application takes changing device characteristics into account. To make this possible, a software framework is set up to enable negotiation between the mobile terminal and the content server. During the initial negotiation, the concept of the Universal Multimedia Access framework is used. Subsequent negotiations take place after changing terminal characteristics or end-user preferences, and this by means of time-dependent metadata. This newly created flexible and extendable framework makes it possible that multimedia applications interact with the content provider in order to deliver an optimal multimedia presentation for any arbitrary mobile terminal at any given time.
Among some of the most popular multimedia formats available today
are: QuickTime, Shockwave, Advanced Streaming Format, RealVideo and {MPEG-4}. Since broadband Internet became widely available, these multimedia formats have strongly evolved and are extremely popular. This article analyzes these formats based on an existing reference model. This reference model is built on the state-of-the-art in three areas: temporal models, computer based descriptions and synchronization mechanisms. Out of these three areas a set of 10 criteria describing the reference model was created. In this paper we first shortly explain the reference model and it's ten criteria. Then each of the listed multimedia formats is mapped onto the reference model. Finally, a comparison based on the reference model is given. In the conclusions section we point out some of the strong and some of the weak points for the different multimedia formats based on the comparison.
The number of terminals that have access to multimedia content by means of a network is rapidly increasing. More and more, the characteristics of different terminals are increasing in variety. In addition, their users can have different preferences. Therefore, the adaptation of multimedia content to a specific terminal and/or its user has become an important research issue. Such an adaptation is mainly based on two aspects: the description of the multimedia content and the description of the user environment. Both can be considered as metadata, and can be formatted in an XML language (e.g., MPEG-7 and CC/PP). However, it is not yet clear how we can realize a generic mapping mechanism between two such vocabularies. We feel that such a mechanism is necessary to accomplish a mature content adaptation framework. This paper describes how such a mechanism can be achieved. We attach requirements and preferences of the user environment to specific aspects of the description of multimedia content. Based on this information, we try to maximize the value of the adapted content, while making it appropriate for the terminal. We also take into account the extensibility of the existing vocabularies we focus on, because this means our mechanism will also be extensible.
A comprehensive approach to the access of archival collections necessitates the interplay of various types of metadata standards. Each of these standards fulfills its own part within the context of a 'metadata infrastructure'. Besides this, it should be noted that present-day digital libraries are often limited to the management of mainly textual and image-based material. Archival Information Systems dealing with various media types are still very rare. There is a need for a methodology to deal with time-dependant media within an archival context. The aim of our research is to investigate and implement a number of tools supporting the content management multimedia data within digital collections. A flexible and extendible framework is proposed, based on the emerging Metadata Encoding and Transmission Standard (METS). Firstly, we will focus on the description of archival collections according to the archival mandates of provenance for the benefit of an art-historical research in an archive-theoretically correct manner. Secondly, we will examine the description tools that represent the semantics and structure of multimedia data. In this respect, an extension of the present archival metadata framework has been proposed to time-based media content delivered via standards such as the MPEG-7 multimedia content description standard.
The current state of the art in content description (MPEG-7) does not provide a rich set of tools to create functional metadata (metadata that contains not only the description of the content but also a set of methods that can be used to interpret, change or analyze the content). This paper presents a framework of which the primary goal is the integration of functional metadata into the existing standards. Whenever it is not only important what is in the multimedia content, but also what is happening with the information in the content, functional metadata can be used to describe this. Some examples are: news tickers, sport results, online auctions. In order to extend content description schemes with extra functionality, MPEG-7 based descriptors are defined to allow the content creator to add his own properties and methods to the multimedia data, thus making the multimedia data self describing and manipulatable. These descriptors incorporate concepts from object technology such as objects, interfaces and events. Descriptors allow the content creator to add properties to these objects and interfaces, methods can be defined using a descriptor and activated using events. The generic use of these properties and methods are the core of the functional metadata framework. A complete set of MPEG-7 based descriptors and descriptor schemes is presented, enabling the content creator to add functional metadata to the multimedia data. An implementation of the proposed framework has been created proving the principles of functional metadata. This paper presents a method for adding extra functionality to metadata and hence to multimedia data. It is shown that doing so preserves existing content description methods and that the functional metadata extends the possibilities of the use of content description.
Simulations and measurements of triple head PET acquisitions of a hot sphere phantom were performed to evaluate the performance of two different reconstruction algorithms (projection based ML-EM and listmode ML-EM)for triple head gamma camera coincidence systems. A geometric simulator assuming a detector with 100 percent detection efficiency and only detection of trues was used. The resolution was equal to the camera system. The measurements were performed with a triple headed gamma camera. Simulated and measured data were stored in listmode format, which allowed the flexibility for different reconstruction algorithms. As a measure for the performance the hot spot detectability was taken because tumor imaging is the most important clinical application for gamma camera coincidence systems. The detectability was evaluated by calculating the recovered contrast and the contrast-to-noise ratio. Results show a slightly improved contrast but a clearly higher contrast-to-noise ratio for list mode reconstruction.
We developed an iterative reconstruction method for SPECT which uses list-mode data instead of binned data. It uses a more accurate model of the collimator structure. The purpose of the study was to evaluate the resolution recovery and to compare its performance to other iterative resolution recovery methods in the case of high noise levels The source distribution is projected onto an intermediate layer. Doing this we obtain the complete emission radiance distribution as an angular sinogram. This step is independent of the acquisition system. To incorporate the resolution of the system we project the individual list-mode events over the collimator wells to the intermediate layer. This projection onto the angular sinogram will define the probability a photon from the source distribution will reach this specific location on the surface of the crystal, thus being accepted by the collimator hole. We compared the SPECT list-mode reconstruction to MLEM, OSEM and RBI. We used Gaussian shaped point sources with different FWHM at different noise levels. For these distributions we calculated the reconstructed images at different number of iterations. The modeling of the resolution in this algorithm leads to a better resolution recovery compared to other methods, which tend to overcorrect.
KEYWORDS: Brain, Photons, Data acquisition, Single photon emission computed tomography, Cameras, Imaging systems, Neuroimaging, Monte Carlo methods, Data corrections, Signal attenuation
A practical method for scatter compensation in SPECT imaging is the triple energy window technique (TEW) which estimates the fraction of scattered photons in the projection data pixel by pixel. This technique requires an acquisition of counts in three windows of the energy spectrum for each projection bin, which is not possible on every gamma camera. The aim of this study is to set up a scatter template for brain perfusion SPECT imaging by means of the scatter data acquired with the triple energy window technique. This scatter template can be used for scatter correction as follows: the scatter template is realigned with the acquired, by scatter degraded and reconstructed image by means of the corresponding emission template, which also includes scatter counts. The ratios between the voxel values of this emission template and the acquired and reconstructed image are used to locally adjust the scatter template. Finally the acquired and reconstructed image is corrected for scatter by subtracting the thus obtained scatter estimates. We compared the template based approach with the TEW scatter correction technique for data acquired with same gamma camera system and found a similar performance for both correction methods.
Gamma camera PET (Positron Emission Tomography) offers a low-cost alternative for dedicated PET scanners. However, sensitivity and count rate capabilities of dual-headed gamma cameras with PET capabilities are still limited compared to full-ring dedicated PET scanners. To improve the geometric sensitivity of these systems, triple-headed gamma camera PET has been proposed. As is the case for dual-headed PET, the sensitivity of these devices varies with the position within the field of view (FOV) of the camera. This variation should be corrected for when reconstructing the images. In earlier work, we calculated the two-dimensional sensitivity variation for any triple-headed configuration. This can be used to correct the data if the acquisition is done using axial filters, which effectively limit the axial angle of incidence of the photons, comparable to 2D dedicated PET. More recently, these results were extended to a fully 3D calculation of the geometric sensitivity variation. In this work, the results of these calculations are compared to the standard approach to correct for 3D geometric sensitivity variation. Current implementations of triple-headed gamma camera PET use two independent corrections to account for three-dimensional sensitivity variations: one in the transaxial direction and one in the axial direction. This approach implicitly assumes that the actual variation is separable in two independent components. We recently derived a theoretical expression for the 3D sensitivity variation, and in this work we investigate the separability of our result. To investigate the separability of the sensitivity variations, an axial and transaxial profile through the calculated variation was taken, and these two were multiplied, thus creating a separable function. If the variation were perfectly separable, this function would be identical to the calculated variation. As a measure of separability, we calculated the percentual deviation of the separable function to the original variation. We investigated the separability for several camera configurations and rotation radii. We found that, for all configurations, the variation is not separable , and becomes less separable as the rotation radius tends to smaller values. This indicates that in this case, our sensitivity correction will give better results than the separable correction now applied.
In the near future broadband networks will become available to large groups of people. The amount of bandwidth available to these users in the future will be much more than it is now. The availability of bandwidth will give birth to a number of new applications. Application developers will need a framework that enable them to utilize the possibilities of these new networks. In this article we present a document type that will allow the addition of (meta-) information to data streams and the synchronization of a different data streams. It is called SXML (Streaming XML) and is based on the eXtensible Markup Language (XML). The SXML grammar is defined in a document type definition (SXML-DTD). The content of an SXML document can be processed real time or can be retrieved from disk. XML is being used in a complete new manner and in a totally different environment in order to easily describe the structure of the stream. Finally, a preliminary implementation has been developed and is being tested.
An XML-based application was developed, allowing to access multimedia/radiological data over a network and to visualize them in an integrated way within a standard web browser. Four types of data are considered: radiological images, the corresponding speech and text files produced by the radiologist, and administrative data concerning the study (patient name, radiologist's name, date, etc.). Although these different types of data are typically stored on different file systems, their relationship (e.g., image file X corresponds to speech file Y) is described in a global relational database. The administrative data are referred to in an XML file, while links between the corresponding images, speech, and text files (e.g., links between limited text fragments within the text tile, the corresponding fragment in the speech file, and the corresponding subset of images) are described as well. Users are able to access all data through a web browser by submitting a form-based request to the server. By using scripting technology, a HTML document containing all data is produced on the fly, which can be presented within the browser of the user. Our application was tested for a real set of clinical data, and it was proven that the goals that were defined above are realized.
The 3D acquisition data from positron coincidence detection on a gamma camera, can be stored in list-mode or histogram format. The standard processing of the list mode-data is Single Slice Rebinning (with a maximum acceptance angle) to 2D histogrammed projections followed by Ordered Subsets Expectation Maximization reconstruction. This method has several disadvantages: sampling accuracy is lost by histogramming events, axial resolution degrades with increasing distance from the center of rotation and useful events, with angle bigger than the acceptance angle, are not included in the reconstruction. Therefore an iterative reconstruction algorithm, operating directly on list-mode data, has been implemented. The 2D and 3D version of this iterative list-mode algorithm have been compared with the aforementioned standard reconstruction method. A higher number of events is used in the reconstruction, which results in a lower standard deviation. Resolution is fairly constant over the Field of View. The use of a fast projector and backprojector reduces the reconstruction time to clinical acceptable times.
In the near future, it will be possible to perform coincidence detection on a gamma camera with three heads, which increases the geometric sensitivity of the system. Different geometric configurations are possible, and each configuration yields a different geometric sensitivity. The purpose of this work was to calculate the sensitivities for different three-headed configurations as a function of the position in the field of view, the dimensions of the detector heads and the distance of the heads from the center of the field of view. The configurations that were compared are: a regular two headed configuration (180 deg. opposed), a triple-headed configuration with the three heads in an equilateral triangle (120 deg.), and a triple-headed configuration with two heads in a regular two headed configuration, and the third perpendicular between the first two, which makes a U-shaped configuration. An expression was derived for any planar detector configuration to calculate the geometric sensitivity for each Line Of Response (LOR). This sensitivity was integrated to get the sensitivity profile, which gives the geometric sensitivity at a certain distance from the center of rotation. We found that the triangular configuration gave the best sensitivities when placed very near to each other (nearly full ring configuration), but for larger fields of view, the U-shaped configuration performed better.
KEYWORDS: Magnetic resonance imaging, Signal processing, In vivo imaging, Magnetism, Image processing, Image restoration, Spatial frequencies, Demodulation, Head, Time metrology
In this paper, we will introduce a resampling method for in vivo projection reconstruction (PR) magnetic resonance (MR) signals. We will describe the physical processes causing the inaccurate sampling of these signals. Based on the theoretical properties of the signal, a technique to reduce the influence of this effect on the signals will be proposed. The method will be validated using simulations and in vivo MR signals. The corrected signals will be shown to be a better approximation of the signals that would be expected on a theoretical basis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.