##### Personal tools
You are here: Home Journal and Conference Papers

# Journal and Conference Papers

Up one level
Data quality is growing in relevance as a research topic. This is becoming increasingly crucial in data-intensive domains, e.g., stock market and nancial studies, eHealth, or environmental research. Indeed, the data deluge characteristic of eScience applications has brought about new concerns along this direction. Quality assessment methods and models have been progressively incorporated in many business environments, as well as in software engineering practices. eScience environments, however, because of the many data source providers, kinds of scienti c expertise needed, and multiple timeand- space scales involved in a given problem make it dicult to assess data quality. This paper is concerned with the evaluation of the quality of data managed by eScience applications. Our approach is based on data provenance, i.e. the history of the origins and transformation processes applied to a given data product. Our contributions include: (i) the speci cation of a framework to track data provenance and use this information to derive quality information; (ii) a model for data provenance based on the Open Provenance Model; and (iii) a methodology to evaluate the quality of some digital artifact based on its provenance. Our proposal is validated experimentally by a prototype we developed that takes advantage of the Taverna work ow system.
This paper addresses the problem of improving the quality of metadata in biological observation databases, in particular those associated with observations of living beings, and which are often used as a starting point for biodiversity analyses. Poor quality metadata lead to incorrect scientific conclusions, and can mislead experts in their analyses. Thus, it is important to design and develop methods to detect and correct metadata quality problems. This is a challenging problem because of the variety of issues concerning such metadata, e.g., misnaming of species, location uncertainty and imprecision concerning where observations were recorded. Related work is limited because it does not adequately model such issues. We propose a geographic approach based on expert-led classification of place and/or range mismatch anomalies detected by our algorithms. Our work is tested using a case study with the Fonoteca Neotropical Jacques Vielliard, one of the 10 largest animal sound collections in the world.

Data quality assessment is a key factor in data-intensive domains. The data deluge is aggravated by an increasing need for interoperability and cooperation across groups and organizations. New alternatives must be found to select the data that best satisfy users’ needs in a given context. This paper presents a strategy to provide information to support the evaluation of the quality of data sets. This strategy is based on combining metadata on the provenance of a data set (derived from workflows that generate it) and quality dimensions defined by the set’s users, based on the desired context of use. Our solution, validated via a case study, takes advantage of a semantic model to preserve data provenance related to applications in a specific domain.

The Web is witnessing an exponential growth of increasingly complex, distributed and heterogeneous documents. This hampers document exchange, as well as their annotation and retrieval. While information retrieval mechanisms concentrate on textual features (corpus analysis), annotation approaches either target specific formats or require that a document follows interoperable standards. This work presents our effort to handle these problems, providing a more flexible solution. Rather than trying to modify or convert the document itself, or to target only textual characteristics, the strategy described in this work is based on an intermediate descriptor -- the document shadow. A shadow represents domain-relevant aspects and elements of both structure and content of a given document, as defined by a user group. Rather than annotating documents themselves, it is the shadows that are annotated, thereby providing independence between annotations and document formats. Our annotations take advantage of the LOD initiative. Via annotations users can derive correlations across shadows, in a flexible way. Moreover, shadows and annotations are stored in databases, therefore allowing uniform database treatments of heterogeneous documents.
Accompanying the growth of the internet and the consequent diversification of applications and data processing needs, there has been a rapid proliferation of data and query models. While graph models such as RDF have been successfully used to integrate data from diverse origins, interaction with the integrated data is still limited by inflexible query models that cannot express concepts from multiple paradigms. In this paper we analyze data and query models typical of modern data-driven applications. We then propose an integrated query model aimed at covering a broad range of applications, allowing expressive queries that capture elements from diverse data models and querying paradigms. We employ graphs models to integrate data from structured and unstructured sources. We also reinterpret as graph analysis tasks several ranking metrics typical of information retrieval (IR) systems. The metrics allow flexible correlation of data elements based on topological properties of the underlying graph. The new query model is materialized in a query language named in* (in star). We present experiments with real data that demonstrate the expressiveness and practicability of our approach.

Data quality is a common concern in a wide range of domains. Since agriculture plays an important role in the Brazilian economy, it is crucial that the data be useful and with a proper level of quality for the decision making process, planning activities, among others. Nevertheless, this requirement is not often taken into account when different systems and databases are modeled. This work presents a review about data quality issues covering some efforts in agriculture and geospatial science to tackle these issues. The goal is to help researchers and practitioners to design better applications. In particular, we focus on the different dimensions of quality and the approaches that are used to measure them.
This article presents an overview of the research conducted at the Laboratory of Information Systems (LIS) at the Institute of Computing, UNICAMP. Its creation, in 1994, was motivated by the need to support data-driven research within multidisciplinary projects involving computer scientists and scientists from other fields. Throughout the years, it has housed projects in many domains - in agriculture, biodiversity, medicine, health, bioinformatics, urban planning, telecommunications, and sports - with scientific results in these fields and in Computer Science, with emphasis in data management, integrating research on databases, image processing, human-computer interfaces, software engineering and computer networks. The research produced 14 PhD theses, 70 MSc dissertations, 40$+$ journal papers and 200$+$ conference papers, having been assisted by over 80 undergraduate student scholarships. Several of these results were obtained through cooperation with many Brazilian universities and research centers, as well as groups in Canada, USA, France, Germany, the Netherlands and Portugal. The authors of this article are faculty at the Institute whose students developed their MSc or PhD research in the lab. For additional details, online systems, paperss and reports, see \texttt{http://www.lis.ic.unicamp.br and \url{http://www.lis.ic.unicamp.br/publications}}
For decades, biologists around the world have recorded animal sounds. As the number of records grows, so does the difficulty to manage them, presenting challenges to save, retrieve, share and manage sounds. These challenges are complicated by the fact that animal sound recordings have specific peculiarities, associated to the context in which the sound was recorded. For example, sounds emitted by individuals that are in groups may be different from ones emitted by isolated individuals. Though these characteristics may be relevant to biologists, they are seldom explicit in the recording metadata. This paper discusses our ongoing research on management of sound recordings, considering factors such as environmental or social contexts, which are not treated by current systems. This work exploits retrieval based on context analysis. Query parameters include context variables that are dynamically derived using public services and ontologies associated with sound recording metadata. Part of the results have been validated through a web prototype, discussed in the text.
Environmental monitoring studies present many challenges. A huge amount of data are provided in different formats from different sources (e.g. sensor networks and databases). This paper presents a framework we have developed to overcome some of these problems, based on combining aspects of Enterprise Service Bus (ESB) architectures and Event Processing mechanisms. First, we treat integration using ESB and then use event processing to transform, filter and detect event patterns, where all data arriving at a given point are treated uniformly as event streams. A case study concerning data streams of meteorological stations is provided to show the feasibility of this solution.
Modern data analysis deeply relies on computational visualization tools, specially when spatial data is involved. Important efforts in governmental and private agencies are looking for patterns and insights buried in dispersive, massive amounts of data (conventional, spatiotemporal, etc.). In Visual Analytics users must be empowered to analyze data from different perspectives, integrating, transforming, aggregating and deriving new representations of conventional as well as spatial data. However, a challenge for visual analysis tools is how to articulate such wide variety of data models and formats, specially when multiple representations of geographic elements are involved. A usual approach is to convert data to a database - e.g., a multi-representation database - which centralizes and homogenizes them. This approach has restrictions when facing the dynamic and distributed model of the Web. In this paper we propose an on the fly and on demand multi-representation data integration and homogenization approach, named Lens, as an alternative that fits better with the Web. It combines a metamodel driven approach to transform data to a unifying multidimensional and multi-representation model, with a middleware-based architecture for seamless and on-the-fly data access, tailored to Visual Analytics.
Most of research data handled by biologists are in electronic spreadsheets. They became a popular technique to create data tables, which are easy to implement as isolated entities, but are inappropriate for integration with other spreadsheets or data sources and for enhanced queries, due to the informality of their implicit schemas. Several initiatives aim to interpret these implicit schemas of spreadsheets, making them explicit in order to drive the extraction and mapping of native data to open standards of interoperability. However, we observed limitations in such interpretation process, which is detached of the spreadsheet creation context. In this paper we present a strategy for characterizing spreadsheets, centered in their creation context, and we investigate how this characterization can be used to improve an automated interpretation and mapping of their respective schemas in the Biology usage domain. The strategy presented here is supporting a work in progress of a tool to automatically recognize spreadsheet schemas.
This paper is concerned with discussing issues associated with the emerging paradigm of collaborative scientific environments on the Web, and on challenges facing teams with complementary expertise, who work across the Web. The emphasis is on the multiple focuses in which these groups attack a problem, and how this can be approached from a spatio-temporal database perspective.
Document production tools are present everywhere, resulting in an exponential growth of increasingly complex, dis- tributed and heterogeneous documents. This hampers document exchange, as well as their annotation, indexing and retrieval. Existing approaches to these tasks either concentrate on speciﬁc formats or require representing document’s content using interoperable standards or schema. This work presents our eﬀort to handle this problem. Rather than try- ing to modify or convert the document itself, our strategy deﬁnes an intermediate and interoperable descriptor – shadow – that summarizes key aspects and elements of a given document, improving its annotation, indexation and retrieval process regardless of its format. Shadows can be used with diﬀerent purposes, from semantic annotations and context- sensitive annotations, to content indexation and clustering.

An ever-increasing number of web-based repositories aimed at sharing content, links or metadata rely on tags informed by users to describe, classify and organize their data. The term folksonomy has been used to define this "social taxonomy", which emerges from tagging carried by users interacting in social environments. It contrasts with the formalism and systematic creation process applied to ontologies. In our research we propose that ontologies and folksonomies have complementary roles. The knowledge systematically organized and formalized in ontologies can be enriched and contextualized by the implicit knowledge which emerges from folksonomies. This paper presents our approach to build a "folksonomized" ontology as a confluence of a formal ontology enriched with social knowledge extracted from folksonomies. The formal embodiment of folksonomies has been explored to empower content search and classification. On the other hand, ontologies are supplied with contextual data, which can improve relationship weighting and inference operations. The paper shows a tool we have implemented to produce and use folksonomized ontologies. It was used to attest that searching operations can be improved by this combination of ontologies with folksonomies.
The data deluge of information in the Web challenges internauts to organize their references to interesting content in theWeb as well as in their private storage space off-line. Having an automatically managed personal index to content acquired from theWeb is useful for everybody, but critical to researchers and scholars. In this paper, we discuss concepts and problems related to organizing information through multi-faceted hierarchical categorization. We introduce the organograph as a mechanism to specify multiple views of how content is organized. Organographs can help scientists to automatically organize their documents along multiple axes, improving sharing and navigation through themes and concepts according to a particular research objective.
The replication of a web service over geographically distributed locations can improve the QoS perceived by its clients. An important issue in such a deployment is the efficiency of the policy applied to distribute client requests among the replicas. In this paper, we propose a new approach for client-based load distribution that adaptively changes the fraction of load each client submits to each service replica to try to minimize overall response times. Our results show that the proposed strategy can achieve better response times than algorithms that eagerly try to choose the best replica for each client.
The Internet has become the universal support for computer applications. This increases the need for solutions that provide dependability and QoS for web applications. The replication of web servers on geographically distributed datacenters allows the service provider to tolerate disastrous failures and to improve the response times perceived by clients. A key issue for good performance of worldwide distributed web services is the efficiency of the load balancing mechanism used to distribute client requests among the replicated servers. Load balancing can reduce the need for over-provision of resources, and help tolerate abrupt load peaks and/or partial failures through load conditioning. In this paper, we propose a new load balancing solution that reduces service response times by redirecting requests to the closest remote servers without overloading them. We also describe a middleware that implements this protocol and present the results of a set of simulations that show its usefulness.
For decades, biologists around the world have recorded animal sounds. As the number of records grows, so does the difficulty to manage them, presenting challenges to save, retrieve, share and manage the sounds. This paper presents our preliminary results concerning management of large volumes of animal sound data. The paper also provides an overview from our prototype, an online environment focused on management of this data. This paper also discusses our case study, concerning more than 1 terabyte of animal recordings from Fonoteca Neotropical Jacques Vielliard, at UNICAMP, Brazil.
eScience research, in computer science, concerns the development of tools, models and techniques to help scientists from other domains to develop their own research. One problem which is common to all fields is concerned with the management of heterogeneous data, offer- ing multiple interaction possibilities. This paper presents a proposal to help solve this problem, tailored to wireless sensor data – an im- portant data source in eScience. This proposal is illustrated with a case study.
Sensor networks have increased the amount and variety of temporal data available, requiring the definition of new techniques for data mining. Related research typically addresses the problems of indexing, clustering, classification, summarization, and anomaly detection. There is a wide range of techniques to describe and compare time series, but they focus on series’ values. This paper concentrates on a new aspect—that of describing oscillation patterns. It presents a technique for time series similarity search, and multiple temporal scales, defining a descriptor that uses the angular coefficients from a linear segmentation of the curve that represents the evolution of the analyzed series. This technique is generalized to handle co-evolution, in which several phenomena vary at the same time. Preliminary experiments with real datasets showed that our approach correctly characterizes the oscillation of single time series, for multiple time scales, and is able to compute the similarity among sets of co-evolving series.
There is a growing demand for accurate information about the real environmental impact caused by cattle, accompanied by a concern for increased production of cattle related products in a sustainable manner. With the widespread adoption of RFID chips for bovine traceability and new technologies for measuring carbon dioxide in the atmosphere, it is now feasible to develop carbon cycle models that combine such factors. This presents challenges that range from data management to model specification and validation, to correlate animal movements and their impact on different biomes. This paper presents a proposal towards this goal, concerned with the creation of a framework to store and index semantic space trajectories of livestock to enable monitoring of the production of CO2.
One of the concerns in eScience research is the design and development of novel solutions to support distributed collaboration. In this context, regardless of the scientific domain, an important problem is the reproducibility of the results from scientific activities, considering the heterogeneous data involved and the specific research context. This paper presents a proposal to help solve this problem, proposing a software architecture to handle provenance issues.

eScience research, in computer science, concerns the development of tools, models and techniques to help scientists from other domains to develop their own research. One problem which is common to all is concerned with management of heterogeneous data offering multiple interaction possibilities. This paper presents a proposal to help solve this problem, tailored to wireless sensor data – an important data source in eScience. This proposal is illustrated with a case study.
A key issue for good performance of geographically replicated web services is the efficiency of the load balancing mechanism used to distribute client requests among the replicas. This work revisits the research on DNS-based load balancing mechanisms considering a SOA (Service-Oriented Architecture) scenario. In this kind of load balancing solution the Authoritative DNS (ADNS) of the distributed web service performs the role of the client request scheduler, redirecting the clients to one of the server replicas, according to some load distribution policy. This paper proposes a new policy that combines client load information and server load information in order to reduce the negative effects of the DNS caching on the load balancing. We also present the results obtained through an experimental tesbed built on basis of the TPC-W benchmark
In Content-based Image Retrieval (CBIR), accurately ranking the returned images is of paramount importance, since users consider mostly the topmost results. The typical ranking strategy used by many CBIR systems is to employ image content descriptors, so that returned images that are most similar to the query image are placed higher in the rank. While this strategy is well accepted and widely used, improved results may be obtained by combining multiple image descriptors. In this paper we explore this idea, and introduce algorithms that learn to combine information coming from different descriptors. The proposed learning to rank algorithms are based on three diverse learning techniques: Support Vector Machines (CBIR-SVM), Genetic Programming (CBIR-GP), and Association Rules (CBIR-AR). Eighteen image content descriptors (color, texture, and shape information) are used as input and provided as training to the learning algorithms. We performed a systematic evaluation involving two complex and heterogeneous image databases (Corel e Caltech) and two evaluation measures (Precision and MAP). The empirical results show that all learning algorithms provide significant gains when compared to the typical ranking strategy in which descriptors are used in isolation. We concluded that, in general, CBIR-AR and CBIR-GP outperforms CBIR-SVM. A fine-grained analysis revealed the lack of correlation between the results provided by CBIR-AR and the results provided by the other two algorithms, which indicates the opportunity of an advantageous hybrid approach.
This paper presents Eva, a tool for evaluating image descriptors for content-based image retrieval. Eva integrates the most common stages of an image retrieval process and provides functionalities to facilitate the comparison of image descriptors in the context of content-based image retrieval. Eva supports the management of image descriptors and image collections and creates a standardized environment to run comparative experiments using them.
Classifying Remote Sensing Images (RSI) is a hard task. There are automatic approaches whose results normally need to be revised. The identification and polygon extraction tasks usually rely on applying classification strategies that exploit visual aspects related to spectral and texture patterns identified in RSI regions. There are a lot of image descriptors proposed in the literature for content-based image retrieval purposes that can be useful for RSI classification. This paper presents a comparative study to evaluate the potential of using successful color and texture image descriptors for remote sensing retrieval and classification. Seven descriptors that encode texture information and twelve color descriptors that can be used to encode spectral information were selected. We perform experiments to evaluate the effectiveness of these descriptors, considering image retrieval and classification tasks. To evaluate descriptors in classification tasks, we also propose a methodology based on KNN classifier. Experiments demonstrate that Joint Auto-Correlogram (JAC), Color Bitmap, Invariant Steerable Pyramid Decomposition (SID) and Quantized Compound Change Histogram (QCCH) yield the best results.

Fish species identi cation is critical to the study of sh ecology and management of sheries. Traditionally, dichotomous keys are used for sh identi cation. The keys consist of questions about the observed specimen. Answers to these questions lead to more questions till the reader identi es the specimen. However, such keys are incapable of adapting or changing to meet di erent sh identi cation approaches, and often do not focus upon distinguishing characteristics favored by many eld ecologists and more user-friendly eld guides. This makes learning to identify sh dicult for Ichthyology students. Students usually supplement the use of the key with other methods such as making personal notes, drawings, annotated sh images, and more recently, sh information websites, such as Fishbase. Although these approaches provide useful additional content, it is dispersed across heterogeneous sources and can be tedious to access. Also, most of the existing electronic tools have limited support to manage user created content, especially that related to parts of images such as markings on drawings and images and associated notes. We present SuperIDR, a superimposed image description and retrieval tool, developed to address some of these issues. It allows users to associate parts of images with text annotations. Later, they can retrieve images, parts of images, annotations, and image descriptions through text- and content-based image retrieval. We evaluated SuperIDR in an undergraduate Ichthyology class as an aid to sh species identi cation and found that the use of SuperIDR yielded a higher likelihood of success in species identi cation than using traditional methods, including the dichotomous key, sh web sites, notes, etc.
There are several applications which need support for complex objects, such as new mechanisms for managing data, creating references, links and annotations; clustering or organizing complex digital objects and their components. At this work we present a research proposal to address these issues. The objective is to specify and implement a formal and uni ed framework to manage multimodal complex objects in digital libraries, using the 5S formalism and Digital Content Component (DCC) aggregation.
The quest for interoperability is one of the main driving forces behind international organizations such as OGC and W3C. In parallel, a trend in systems design and development is to break down GIS functionalities into modules that can be composed in an ad hoc manner. This component-driven approach increases flexibility and extensibility. For scientists whose research involves geospatial analysis, however, such initiatives mean more than interoperability and flexibility. These efforts are progressively shielding these users from having to deal with problems such as data representation formats, communication protocols or pre-processing algorithms. Once scientists are allowed to abstract from lower level concerns, they can shift their focus to the design and implementation of the computational models they are interested in. This paper analyzes how interoperability and componentization efforts have this underestimated impact on the design and development perspective. This discussion is illustrated by the description of the design and implementation of WebMAPS, a geospatial information system to support agricultural planning and monitoring. By taking advantage of new results in the above areas, the experience with WebMAPS presents a road map to leverage system design and development by the seamless composition of distributed data sources and processing solutions.
Scientific research is producing and consuming large volumes of multimedia data at an ever growing rate. Data annotations are used, among others, to provide context information and enhance content management, making it easier to interpret and share data. However, raw multimedia data often needs to go through complex processing steps before it can be consumed. During these transformation processes, original annotations from the production phase are often discarded or ignored, since their usefulness is usually limited to the first transformation step. New annotations must be made at each step, and associated with the final product, a time consuming task often carried out manually. The task of systematically associating new annotations to the result of each data transformation step is known as annotation propagation. This paper introduces techniques for structuring and propagating annotations, in parallel to the data transformation processes, thereby alleviating the overhead and decreasing the errors introduced by manual annotation. This helps the construction of new annotated multimedia data sets, preserving contextual information. The solution is based on: (i) the notion of semantic annotations; (ii) a set of transformations rules, based on ontological relations; and, (iii) workflows that deal with interrelated processing steps.

Edition of natural images usually asks for considerable user involvement, being segmentation one of the main challenges. This paper describes a unified graph-based framework for fast, precise and accurate interactive image segmentation. The method divides segmentation into object recognition, enhancement and extraction. Recognition is done by the user when markers are selected inside and outside the object. Enhancement increases the dissimilarities between object and background and extraction separates them. Enhancement is done by a fuzzy pixel classifier and it has a great impact in the number of markers required for extraction. In view of minimizing user involvement, we focus this paper on a comparative study among popular classifiers for enhancement, conducting experiments with several natural images and seven users.
Geographic information systems (GIS) are increasingly using geospatial data from the Web to produce geographic information. One big challenge is to find the relevant data, which often is based on keywords or even file names. However, these approaches lack semantics. Thus, it is necessary to provide mechanisms to prepare data to help retrieval of semantically relevant data. This paper proposes an approach to attack this problem. This approach is based on semantic annotations that use geographic metadata and ontologies to describe heterogeneous geospatial data. Semantic annotations are RDF/XML files that rely on a FGDC metadata schema, filled with appropriate ontology terms, and stored in a XML database. The proposal is illustrated by a case study of semantic annotations of agricultural resources, using domain ontologies.
Geographic information systems (GIS) are increasingly using geospatial data from the Web to produce geographic information. One big challenge is to find the relevant data, which often is based on keywords or even file names. However, these approaches lack semantics. Thus, it is necessary to provide mechanisms to prepare data to help retrieval of semantically relevant data. This paper proposes an approach to attack this problem. This approach is based on semantic annotations that use geographic metadata and ontologies to describe heterogeneous geospatial data. Semantic annotations are RDF/XML files that rely on a FGDC metadata schema, filled with appropriate ontology terms, and stored in a XML database. The proposal is illustrated by a case study of semantic annotations of agricultural resources, using domain ontologies.
The Web is a huge repository of geospatial information. Efficient retrieval of this information is a key factor in planning and decision-making in many domains, including agriculture. However, standards for data annotation and exchange enable only syntactic interoperability, while semantic heterogeneity presents challenges. This work describes a framework that tackles interoperability problems via semantic annotations, which are based on multiple ontologies. The framework is being developed within a project to support agricultural planning in Brazil. The paper discusses design and implementation issues using a real case study, provides an overview of annotation mechanisms and identifies requirements for annotating agricultural data.
Assembling virtual organizations is a complex process, which can be modeled and managed by means of a multi-party contract. Such a contract must encompass seeking consensus among parties in some issues, while simultaneously allowing for competition in others. Present solutions in contract negotiation are not satisfactory because they do not accommodate such a variety of needs and negotiation protocols. This paper shows our solution to this problem, discussing how our SPICA negotiation protocol can be used to build up virtual organizations. It assesses the effectiveness of our approach and discusses the protocol’s implementation.
@article{1517466, author = {Mac\'{a}rio,, Carla Geovana N. and Medeiros,, Claudia Bauzer}, title = {Specification of a framework for semantic annotation of geospatial data on the web}, journal = {SIGSPATIAL Special}, volume = {1}, number = {1}, year = {2009}, issn = {1946-7729}, pages = {27--32}, doi = {http://doi.acm.org/10.1145/1517463.1517466}, publisher = {ACM}, address = {New York, NY, USA}, }
This paper presents an interactive technique for remote sensing image classification. In our proposal, users are able to interact with the classification system, indicating regions which are of interest. Furthermore, a genetic programming approach is used to learn user preferences and combine image region descriptors that encode spectral and texture properties. Experiments demonstrate that the proposed method is effective and suitable for image classification tasks.
CBIR is a challenging problem both in terms of effectiveness and efficiency. In this paper, we present a flexible cluster-and-search approach that is able to reuse any previously proposed image descriptor as long as a suitable similarity function is provided. In the clustering step, the image data set is clustered using a hybrid divisive-agglomerative hierarchical clustering technique. The obtained clusters are organized in a tree that can be traversed efficiently using the similarity function associated with the chosen image descriptors. Our experiments have shown that we can improve search-time performance by a factor of 10 or more, at the cost of small loss, typically less than 15%, in effectiveness whencompared to the state-of-the-art solutions.
This paper proposes a new texture descriptor to guide the search and retrieval in image databases. It extracts rich information from global and local primitives of textured images. At a higher level, the global macro-features in textured images are characterized by exploiting the multiresolution properties of the Steerable Pyramid Decomposition. By doing this, the global texture configurations are highlighted. At a finer level, the local arrangements of texture micro-patterns are encoded by the Local Binary Pattern operator. Experiments were carried out on the standard Vistex dataset aiming to compare our descriptors against popular texture extraction methods with regard to their retrieval accuracies. The comparative evaluations allowed us to show the superior descriptive properties of our feature representation methods.
Content-based image retrieval (CBIR) is a challenging task. Common techniques use only low-level features. However, these solutions can lead to the so-called ‘semantic gap’ problem: images with high feature similarities may be different in terms of user perception. In this paper, our objective is to retrieve images based on color cues which may present some affine transformations. For that, we present CSIR: a new method for comparing images based on discrete distributions of distinctive color and scale image regions. We validate the technique using images with a large range of viewpoints, partial occlusion, changes in illumination, and various domains.
This paper introduces a process for developing Web GIS (Geographic Information Systems) applications. This process integrates the NDT (Navigational Development Techniques) approach with some of the Organizational Semiotic models. The use of the proposed development process is illustrated for a real application: the construction of the WebMaps system. WebMaps is a Web GIS system whose main goal is to support harvest planning in Brazil.
This paper proposes a new texture classification system, which is distinguished by: (1) a new rotation-invariant image descriptor based on Steerable Pyramid Decomposition, and (2) by a novel multi-class recognition method based on Optimum Path Forest. By combining the discriminating power of our image descriptor and classifier, our system uses small size feature vectors to characterize texture images without compromising overall classification rates. State-of-the-art recognition results are further presented on the Brodatz dataset. High classification rates demonstrate the superiority of the proposed method.
Sistemas de informação de biodiversidade lidam com um conjunto heterogêneo de informações providas por grupos de pesquisa, como espécies estudadas, estruturação das informações e locais de estudo. Esta heterogeneidade de dados, usuários e procedimentos dificulta o reuso e o compartilhamento de informações. O objetivo deste trabalho é melhorar o processo de consulta às informações em sistemas de biodiversidade. Para tanto, e proposto um módulo que pré-processa uma consulta de usuário (cientista) agregando informações, provenientes de ontologias, para desambigüizar a consulta. O trabalho pressupõe que os dados a serem consultados estão distribuídos em repositórios na Web, os quais são mantidos por grupos de cientistas e têm seus conteúdos acessíveis por servicos Web.
Com a evolução das redes para o modelo de nova geração (NGN - Next Generation Network), onde ha a separação das camadas de transporte, controle e aplicação, torna-se chave um subsistema que habilite e arbitre as requisições dos aplicativos e controladores de sessão em função dos recursos de Qualidade de Serviço (largura de banda, prioridade, etc.) disponíveis nos diferentes segmentos da rede de transporte. Este trabalho introduz as funcionalidades definidas pelos padrões internacionais das NGNs, propondo uma arquitetura de implementação que e objeto de pesquisa aplicada para explorar tecnologias e conceber estudos para o desenvolvimento de soluções que propiciem um gerenciamento integrado dos recursos de redes heterogêneas e atendam as necessidades particulares do mercado brasileiro de telecomunicações.

Huge image collections have been created, managed and stored into image databases. Given the large size of these collections it is essential to provide efficient and effective mechanisms to retrieve images. This is the objective of the so-called content-based image retrieval – CBIR – systems. Traditionally, these systems are based on objective criteria to represent and compare images. However, users of CBIR systems tend to use subjective elements to compare images. The use of these elements have improved the effectiveness of contentbased image retrieval systems. This paper discusses approaches that incorporate semantic information into content-based image retrieval process, highlighting some new challenges on this area.
This paper presents a new relevance feedback method for content-based image retrieval using local image features. This method adopts a genetic programming approach to learn user preferences and combine the region similarity values in a query session. Experiments demonstrate that the proposed method yields more effective results than the Local Aggregation Pattern (LAP)-based relevance feedback technique.
Scientific research is producing and consuming large volumes of multimedia data at an ever growing rate. Annotations to the data helps associating context and enhances content management, making it easier to interpret and share data. However, raw data often needs to go through complex processing steps before it can be consumed. During these transformation processes, original annotations from the production phase are often discarded or ignored, since their usefulness is usually limited to the first transformation step. New annotations must be associated with the final product, a time consuming task often carried out manually. Systematically associating new annotations to the result of each data transformation step is known as {\em annotation propagation}. This paper introduces techniques for structuring annotations by applying references to ontologies and automatically transforming these annotations along with data transformation processes. This helps the construction of new annotated multimedia data sets, preserving contextual information. The solution is based on: (i) the notion of semantic annotations; and (ii) a set of transformations rules, based on ontological relations.
This paper analyzes how interoperability and componentization efforts in the geospatial domain have an underestimated impact on the user perspective, directly affecting model development. This discussion is illustrated by the description of the design and implementation of WebMAPS, a geospatial information system to support agricultural planning and monitoring.
In May 2006, the Brazilian Computer Society proposed five Grand Research Challenges in Computer Science in Brazil. The society's goal was to foster long-term planning and research in computer science, enhance cooperation with other scientific domains, and provide input to public R&D policymakers in Brazil. This paper presents the five challenges under a global perspective, showing how they can benefit from cooperation with other research fields, and discussing CS research trends in Brazil. The paper also discusses how the challenges were elicited, and future directions.
Research in biodiversity associates data on living beings and their habitats, constructing sophisticated models and correlating several kinds of heterogeneous data. Such data are provided by research groups with different vocabularies, methodologies and goals, which hampers their cooperation. Ontologies are being proposed as one of the means to solve heterogeneity problems. However, this gives birth to new challenges to manage and share ontologies. This dissertation specified and developed a new kind of Web Service, whose goal is to contribute to solve such problems. The service supports a wide range of operations on ontologies, and was implemented and validated with real case studies in biodiversity, for large ontologies. The dissertation is available on UNICAMP digital library.
Supply chains are composed of distributed, heterogeneous and autonomous elements, whose relationships are dynamic. Agricultural supply chains, in particular, have a number of distinguishing features - e.g., they are characterized by strict regulations to ensure safety of food products, and by the need for multi-level traceability. Contracts in such chains need sophisticated specification and management of chain agents -- their roles, rights, duties and interaction modes -- to ensure auditability. This paper proposes a framework that attacks these problems, which is centered on three main elements to support and manage agent interactions: Contracts, Coordination Plans (a special kind of business process) and Regulations (the business rules). The main contributions are: i) a contract model suitable for agricultural supply chains; ii) a negotiation protocol able to produce such contracts, which allows a wide range of negotiation styles; iii) negotiation implementation via Web services. As a consequence, we maintain independence between business processes and contract negotiation, thereby fostering interoperability among chain processes.
Brazil has one of South America's largest information technology (IT) communities. One hundred million people voted electronically for President and congress in 2004, and 97 percent of all income tax declarations are submitted via the Internet. Over 20,000 students graduate every year in computer science alone, and two of the federal government's four industrial priorities are related to IT --- software and semiconductors. Though women represent 60 percent of the country's college graduates, less than 5 percent choose Computer Science as a major. Programs to foster gender equality have little intersection with the national digital inclusion program. This paper points out actions that may be considered to allow Brazilian women to become full citizens of the information society. These actions concern formal and informal means of education, and on visibility and advocacy.

Traffic data coming from sensor networks have prompted a wide range of research issues related with Transportation Information Systems. These data are usually represented by large and complex spatio-temporal series. This paper presents a new approach to manage rough data coming from static georeferenced sensors. Our work is based on combining analytic methods to process sensor data and proposing an architecture for an information system dedicated to road traffic. It is being conducted within a project which uses real data, generated by 1000 sensors, during 3 years, in a french big city.
Scientific models are increasingly dependent on processing large volumes of streamed sensing data from a wide range of sensors, from ground based to satellite embarked infrareds. The proliferation, variety and ubiquity of those devices have added new dimensions to the problem of data handling in computational models. This raises several issues, one of which -- providing means to access and process these data -- is tackled by this paper. Our solution involves the design and implementation of a framework for sensor data management, which relies on a specific component technology -- DCC. DCCs homogeneously encapsulate individual sensors, sensor networks and sensor data archival files. They also implement facilities for controlling data production, integration and publication. As a result, developers need not concern themselves with sensor particularities, dealing instead with uniform interfaces to access data, regardless of the nature of the data providers.
Biodiversity research requires associating data about living beings and their habitats, constructing sophisticated models and correlating all kinds of information. Data handled are inherently heterogeneous, being provided by distinct (and distributed) research groups, which collect these data using different vocabularies, assumptions, methodologies and goals, and under varying spatio-temporal frames. Ontologies are being adopted as one of the means to alleviate these heterogeneity problems, thus helping cooperation among researchers. While ontology toolkits offer a wide range of operations on ontologies, they are self-contained and cannot be accessed by external applications. Thus, the many proposals for adopting ontologies to enhance interoperability in application development are either based on the use of ontology servers or of ontology frameworks. The latter support many functions, but impose application recoding whenever ontologies change, whereas the first supports ontology evolution, for a limited set of functions. This paper presents Aondê -- a Web service geared towards the biodiversity domain that combines the advantages of both frameworks and servers, supporting flexible ontology sharing and management on the Web. By clearly separating storage concerns from semantic issues, the service provides independence between ontology evolution and the applications that need them. The service provides a wide range of basic operations for creation, storage, management, analysis and integration of multiple ontologies. These operations can be repeatedly invoked by client applications to construct more complex manipulations. Aondê has been validated for real biodiversity case studies.
Sensor networks have increased the amount and variety of temporal data available, requiring the definition of new techniques for data mining. Related research typically addresses the problems of indexing, clustering, classification, summarization, and anomaly detection. They present many ways for describing and comparing time series, but they focus on their values. This paper concentrates on a new aspect - that of describing oscillation patterns. It presents a technique for time series similarity search, based on multiple temporal scales, defining a descriptor that uses the angular coefficients from a linear segmentation of the curve that represents the evolution of the analyzed series. Preliminary experiments with real datasets showed that our approach correctly characterizes the oscillation of time series.
This paper considers the problems of sensor data publication, taking advantage of research on components and Web service standards. Sensor data is widely used in scientific experiments -- e.g., for model validation, environment monitoring, and calibrating running applications. Heterogeneity in sensing devices hamper effective use of their data, requiring new solutions for publication mechanisms. Our solution is based on applying a specific component technology, Digital Content Component (DCC), which is capable of uniformly encapsulating data and software. Sensor data publication is tackled by extending CPTs to comply with geospatial standards for Web services from OGC (Open Geospatial Consortium). Using this approach, Web services can be implemented by CPTs, with publication of sensor data following standards. Furthermore, this solution allows client applications to request the execution of pre-processing functions before data is published. The approach enables scientists to share, find, process and access geospatial sensor data in a flexible and homogeneous manner.
Este trabalho descreve desafios e resultados do projeto WebMAPS, um esforço multidisciplinar envolvendo ciências agrárias e de computação, em desenvolvimento na UNICAMP. Seu objetivo é desenvolver uma plataforma baseada em serviços Web para o planejamento agro-ambiental. Requer pesquisa de ponta voltada à especificação e à implementação de software com acesso a vários tipos de informação distribuída - imagens de satélite, dados provenientes de sensores, dados de produção agrícola e dados geográficos.
Tackling biodiversity information is essentially a distributed effort. Data handled are inherently heterogeneous, being provided by distinct research groups and using different vocabularies. Queries in biodiversity systems require to correlate these data, using many kinds of knowledge on geographic, biologic and ecological issues. Available biodiversity systems can only cope with part of these queries, and end users must perform several manual tasks to derive the desired correlations, because of semantic mismatches among data sources and lack of appropriate operators. This paper presents a solution based on Web services to meet these challenges. It relies on ontologies to retrieve the query contexts and uses the terms of this context to discover suitable sources in data repositories. This approach is being tested using real data, with new services.
To carry ecologically-relevant biodiversity research, one must collect chunks of information on species and their habitats from a large number of institutions and correlate them using geographic, biologic and ecological knowledge. Distribution and heterogeneity inherent to biodiversity data pose several challenges, such as how to find and merge relevant information on the Web, and process a variety of ecological and spatial predicates. This paper presents a framework that exploits advances in data interoperability and Semantic Web technologies to meet these challenges. The solution relies on ontologies and annotated repositories to support data sharing, discovery and collaborative biodiversity research. A prototype using real data has implemented part of the framework.
E-contracting, i.e., establishing and enacting electronic contracts, has become important because of technological advances (e.g., the availability of web services) and more open markets. However, the establishment of an e-contract is complicated and error prone. There are multiple negotiation styles ranging from auctions to bilateral bargaining. This paper provides an approach for modeling multi-party negotiation protocols in colored Petri nets. It is shown how different negotiation styles can be modeled in a unified and consistent way. Moreover, CPN Tools is used to analyze the resulting colored Petri nets. Simulation can be used for both validation and performance analysis, while state-space analysis can be used to discover anomalies in various multi-part negotiation protocols.
There is a world wide effort to create infrastructures that support multidisciplinary, collaborative and distributed work in scientific research, giving birth to the so-called e-Science environments. At the same time, the proliferation, variety and ubiquity of sensing devices, from satellites to tiny sensors are making huge amounts of data available to scientists. This paper presents a framework with a twofold solution: (i) using a specific kind of component -- DCC -- for homogeneous sensor data acquisition; and (ii) using scientific workflows for flexible composition of sensor data and manipulation software. We present a solution for publishing sensor data tailored to distributed scientific applications.
The evolution of geographic phenomena has been one of the concerns of spatiotemporal database research. However, in a large spectrum of geographical applications, users need more than a mere representation of data evolution. For instance, in urban management applications - e.g. cadastral evolution - users often need to know why, how, and by whom certain changes have been performed as well as their possible impact on the environment. Answers to such queries are not possible unless supplementary information concerning real world events is associated with the corresponding changes in the database and is managed efficiently. This paper proposes a solution to this problem, which is based on extending a spatiotemporal database with a mechanism for managing documentation on the evolution of geographic information. This solution has been implemented in a GIS-based prototype, which is also discussed in the paper.

Tensor scale is a morphometric parameter that unifies the representation of local structure thickness, orientation, and anisotropy, which can be used in several computer vision and image processing tasks. In this paper, we exploit this concept for binary images and propose a shape descriptor that encodes region and contour properties in a very efficient way. Experimental results are provided, showing the effectiveness of the proposed descriptor, when compared to other relevant shape descriptors, with regard to their use in content-based image retrieval systems.

This paper presents a solution for managing spatio-temporal data in a gis database. This solution allows efficiently storing and handling temporal data and alternatives using a version mechanism. It can be used for different types of gis-based applications, such as urban planning, environmental control and utility management.

Bioinformatics activities are growing all over the world, with proliferation of data and tools. This brings new challenges: how to understand and organize these resources and how to provide interoperability among tools to achieve a given goal. We defined and implemented a framework to help meet some of these challenges. Four issues were considered: the use of Web services as a basic unit, the notion of a Semantic Web to improve interoperability at the syntactic and semantic levels, and the use of scientific workflows to coordinate services to be executed, including their interdependencies and service orchestration.
We present a preliminary description and results of a system to help the curation of genome assembly and annotation. Standard tools are used for these tasks, and our methodology focuses on user guidance, data visualization and integration, and data browsing aspects.
This paper presents an engineering experience for building a Semantic Web compliant system for a scientific application - agricultural zoning. First, we define the concept of ontological cover and a set of relationships between such covers. These definitions, based on domain ontologies, can be used, for example, to support the discovery of services on the Web. Second, we propose a semantic acyclic restriction on ontologies which enables the efficient comparison of ontological covers. Third, we present different engineering solutions to build ontology views satisfying the acyclic restriction in a prototype. Our experimental results unveil some limitations of the current Semantic Web technology to handle large data volumes, and show that the combination of such technology with traditional data. management techniques is an effective way to achieve highly functional and scalable solutions.
With the phenomenal growth of the WWW, rich data sources on many different subjects have become available online. Some of these sources store daily facts that often involve textual geographic descriptions. These descriptions can be perceived as indirectly georeferenced data - e.g., addresses, telephone numbers, zip codes and place names. Under this perspective, the Web becomes a large geospatial database, often providing up-to-date local or regional information. In this work we focus on using the Web as an important source of urban geographic information and propose to enhance urban Geographic Information Systems (GIS) using indirectly georeferenced data extracted from the Web. We describe an environment that allows the extraction of geospatial data from Web pages, converts them to XML format, and uploads the converted data into spatial databases for later use in urban GIS. The effectiveness of our approach is demonstrated by a real urban GIS application that uses street addresses as the basis for integrating data from differentWeb sources, combining these data with high-resolution imagery.
Content-Based Image Retrieval (CBIR) presents several challenges and has been subject to extensive research from many domains, such as image processing or database systems. Database researchers are concerned with indexing and querying, whereas image processing experts worry about extracting appropriate image descriptors. Comparatively little work has been done on designing user interfaces for CBIR systems. This, in turn, has a profound effect on these systems since the concept of image similarity is strongly influenced by user perception. This paper describes an initial effort to fill this gap, combining recent research in CBIR and Information Visualization, studied from a Human-Computer Interface perspective. It presents two visualization techniques based on Spiral and Concentric Rings implemented in a CBIR system to explore query results. The approach is centered on keeping user focus on both the query image, and the most similar retrieved images. Experiments conducted so far suggest that the proposed visualization strategies improves system usability.
Traditional techniques for tracking data provenance have difficulty adapting to the dynamics of the Web. This paper proposes a scheme for provenance estimation, based on domain ontologies. This scheme is part of the POESIA approach for multi-step integration of semi-structured data. The ontologies used for tracking provenance also help to describe, discover, reuse and integrate data and services. In contrast to traditional techniques, this scheme derives data provenance with fewer annotations at the extensional level and thus lower maintenance costs. Additionally, it promotes the use of ontologies to categorize and correlate scopes of data sets, thereby capturing the operational semantics of data integration processes.

This paper presents a collaborative model for agricultural supply chains that supports negotiation, renegotiation, coordination and documentation mechanisms, adapted to situations found in this kind of supply chain – such as return flows and composite regulations. This model comprises basic building blocks and elements to support a chainrsquos dynamic execution. The model is supported by an architecture where chain elements are mapped to Web Services and their dynamics to service orchestration. Model and architecture are motivated by a real case study, for dairy supply chains.

Web GIS applications can be found in many domains. The quality of the interfaces of applications determines not only the usability of such applications, but the possibilities offered to their users. This work investigates aspects of interface quality for Web GIS applications. The approach adopts an inspection evaluation based on ISO 9241. Preliminary results show the effectiveness of such an approach to user interface evaluation as a complement to tests with users.
Geographical Information Systems (GIS) allow the manipulation, management and visualization of georeferenced data. The interest GIS applications has increased in the last years. Currently, Web GIS applications make available through the Internet geographic information dispersed in different places. There are several categories of GIS applications, in different scales and application domains, ranging from urban application to environmental problems. The importance of Web GIS for the agricultural domain comes from the fact that they function as a useful tool for users who work direct or indirectly in the agricultural domain: agriculturists, cooperatives, government instances. Considering the strategic value of these systems and the wide range of different prospective users, this work presents a survey of Web GIS applications with emphasis in the agricultural domain, and investigates user-system interaction aspects in these applications.

Projects using geographic information tools involve a large variety of data objects, represented in different formats. Many efforts pursue standards to represent each kind of data object, and the interoperability between geographic information tools. The proliferation of data and tools raises the need for their reuse. This need can be extended to project reuse. This work presents a proposal to reuse geographic information projects based on a model called digital content component. This model can represent all elements involved in a project – including software components – and their relationship in a open homogeneous format.
The Semantic Web pursues interoperability at syntactic and semantic levels, to face the proliferation of data files with different purposes and representation formats. One challenge is how to represent such data, to allow users and applications to easily find, use and combine them. The paper proposes an infrastructure to meet those goals. The basis of the proposal is the notion of digital content components that extends the Software Engineering software component. The infrastructure offers tools to combine and extend these components, upon user request, managing them within dynamic repositories. The infrastructure adopt XML and RDF standards to foster interoperability, composition, adaptation and documentation of content data. This work was motivated by reuse needs observed in two specific application domains: education and agro-environmental planning.
Spatial database systems has been an active area of research over the past 20 years. A large number of research efforts have appeared in literature aimed at effective modelling of spatial data and efficient processing of spatial queries. This book investigates several aspects of a spatial database system, and includes recent research efforts in this field. More specifically, some of the topics covered are: spatial data modelling; indexing of spatial and spatio-temporal objects; data mining and knowledge discovery in spatial and spatio-temporal databases; management issues; and query processing for moving objects. Therefore, the reader will be able to get in touch with several important issues that the research community is dealing with. Moreover, each chapter is self-contained, and it is easy for the non-specialist to grasp the main issues. The authors of the book’s chapters are well-known researchers in spatial databases, and have offered significant contributions to spatial database literature. The chapters of this book provide an in-depth study of current technologies, techniques and trends in spatial and spatio-temporal database systems research. Each chapter has been carefully prepared by the contributing authors, in order to conform with the book’s requirements.
We present tensor scale descriptor (TSD)--- a shape descriptor for content-based image retrieval, registration, and analysis. TSD exploits the notion of local structure thickness, orientation, and anisotropy as represented by the largest ellipse centered at each image pixel and within the same homogeneous region. The proposed method uses the normalized histogram of the local orientation (the angle of the ellipse) at regions of high anisotropy and thickness within a certain interval. It is shown that TSD is invariant to rotation and to some reasonable level of scale changes. Experimental results with a fish database are presented to illustrate and validate the method.
In this paper, we propose a novel framework using Genetic Programming to combine image database descriptors for content-based image retrieval (CBIR). Our framework is validated through several experiments involving two image databases and specific domains, where the images are retrieved based on the shape of their objects.
There is a wide range of environmental applications requiring sophisticated management of several kinds of data, including spatial data and images of living beings. However, available information systems offer very limited support for managing such data in an integrated manner. This thesis provides a solution to combine these query requirements, which takes advantage of current digital library technology to manage networked collections of heterogeneous data in an integrated fashion. The research contributes to solve problems of specification and implementation of biodiversity information systems that manage images of species, textual descriptions and spatial data in an integrated way.
This article proposes the use of image content, keywords and ontologies to improve the image annotation and retrieval processes through the enhancement of the user’s knowledge of an image database. It proposes an architecture of a flexible system capable of dealing with multiple ontologies and multiple image content descriptors to help these tasks. The validation of the idea is being done through the implementation, in Java, of the software OntoSAIA.
Web GIS applications have received marked attention in the last years, since geographic information can be visualizated/manipulated in different places, for different profiles of users, using the Internet. This increases the complexity implementation of GIS applications, both with regard to functional aspects, and in computer-human interface aspects. The goal of this work is to illustrate the concept and techniques of usability in the context of interfaces for Web GIS applications, by means of a case study of usability test for these applications.
Web applications enable users with different profiles and necessities to access information from diversified locations and with different access tools. Besides the aspects that have already been discussed in works from the Software Quality domain, the accessibility to information and the Internet flexibility have been considered more and more important. Thus, considering accessibility as an important quality attribute for Web applications, in this paper we investigate the subject considering the context of Geographic Information Systems on the Web. Preliminary results of accessibility evaluation on some WebGIS applications show that this domain presents several challenges to be coped with in the design of their user interfaces.
Requirements Engineering (RE) is the process of discovering the purpose of a prospective software system, by identifying stakeholders and their needs, and documenting these in a form that is suitable to analysis, communication, and subsequent implementation. Requirements elicitation is closely related and even interleaved to other RE activities such as: modeling, analysis & negotiation, and communication of requirements. RE is a multidisciplinary and human-centered activity. This paper presents a participatory approach to requirements elicitation that deals with functional and non-functional requirements considering social, political, cultural and ethical issues involved in understanding the problem in the process of RE. The proposed approach is theoretically grounded in methods and models from Organizational Semiotics. The proposed approach is illustrated with a case study related to the development of an application of Geographical Information Systems in the Web (Web GIS). Results of the case study allowed us to observe the contribution of OS in the proposed approach, including elements to inform the user interface design of the system.
The Semantic Web has opened new horizons in exploring Web functionality. One of the many challenges is to proactively support the reuse of digital artifacts stored in repositories all over the world. Our goal is to contribute towards this issue, proposing a mechanism for describing and discovering artifacts called Digital Content Components (DCCs). DCCs are self-contained stored entities that may comprise any digital content, such as pieces of software, multimedia or text. Their specification takes advantage of Semantic Web standards and ontologies, both of which are used in the discovery process. DCC construction and composition procedures naturally lend themselves to patternmatching and subsumption-based search. Thus, many existing methods for Web searching can be extended to look for reusable artifacts. We validate the proposal discussing its implementation for agro-environmental planning.
In this demo proposal, we describe our prototype application, SIERRA, which combines text-based and content-based image retrieval and allows users to link together image content of varying document granularity with related data like annotations. To achieve this, we use the concept of superimposed information (SI), which enables users to (a) deal with information of varying granularity (sub-document to complete document), and (b) select or work with information elements at sub-document level while retaining the original context.
Este artigo propõe um modelo de processo de teste de desempenho para aplicações SIG Web. O modelo considera os casos de uso mais críticos ou de maior risco quanto ao desempenho de um sistema para a criação de cenários de testes. Além disso, prevê a utilização de ferramentas livres para automatização de etapas do processo de avaliação. O modelo foi aplicado ao projeto WebMaps, que é uma aplicação SIG Web cuja finalidade é auxiliar seus usuários no planejamento agrícola a partir de regiões de interesse. Os resultados preliminares obtidos indicam que os testes foram úteis na identificação de problemas da arquitetura preliminar do sistema.

Exploring services for digital libraries (DLs) include two major paradigms, browsing and searching, as well as other services such as clustering and visualization. In this paper, we formalize and generalize DL exploring services within a DL theory. We develop theorems to indicate that browsing and searching can be converted or mapped to each other under certain conditions. The theorems guide the design and implementation of exploring services for an integrated archaeological DL, ETANA-DL. Its integrated browsing and searching can support users in moving seamlessly between these operations, minimizing context switching, and keeping users focused. It also integrates browsing and searching into a single visual interface for DL exploration. A user study to evaluate ETANA-DL's exploring services helped validate our hypotheses.
The problem of access control in databases consists of determining when (and if) users or applications can access stored data, and what kind of access they are allowed. This paper discusses this problem for geographic databases, where constraints imposed on access control management must consider the spatial location context. The model and solution provided are motivated by problems found in AM/FM applications developed in the management of telephone infrastructure in Brazil, in a real life situation.
This paper presents a real time system to guide the search and the retrieval in fingerprint image databases considering both retrieval accuracy and speed. For that purposes, we use multiresolution-based feature extraction and indexing methods considering the textural information inherent to fingerprint images. The extracted feature vectors are used to compute the distance between the fingerprint query image to all the fingerprints in the database and the N most similar images are then retrieved. The focus of this work is to study the utility of multiresolution transforms on the domain of fingerprint recognition.
Two kinds of fingerprint identification approaches have been proposed in the literature to reduce the number of one-to-many comparisons during fingerprint image retrieval, namely, exclusive and continuous classification. Although exclusive classification approaches reduce the number of comparisons, they present some shortcomings, including fingerprint ambiguous classification, and unbalanced fingerprint classification distribution. On the other side, continuous classification approaches have not been further studied. In this context, we propose an original continuous approach to guide the search and the retrieval in fingerprint image databases considering both effectiveness and retrieval speed. For that purposes, we use feature extraction and indexing methods considering the textural and directional information contained in fingerprint images. Preliminary results of our work involves a comparative study of several textural image descriptors obtained by combining different types of the Wavelet Transform with similarity measures. From our experiments we can conclude that the best retrieval accuracy was achieved by combining Gabor Wavelets (GWs) with the Square Chord similarity measure. Furthermore, the presence of noise and distortions in fingerprint images have affected the overall retrieval accuracy.
Supply Chains present many research challenges in Computing, such as the modeling of theirs processes, communication problems between theirs components, logistics and processes management. This paper presents a supply chain traceability model that relies on a Web service-based architecture to ensure interoperability. Geared towards assisting quality control in the agricultural domain, the model allows to trace products, processes and services inside chain. The model has been validated for real life case studies and theWeb service implementation is under way.

We present a new interface for Object-Oriented Database Management Systems (OODBMSs). The GOODIES system combines and expands the functions of many existing interface systems, introducing some new concepts for improved browsing in an OODBMS. The implementation of GOODIES proposes a new approach to database interfaces development: instead of being strongly dependent of the underlying DBMS, GOODIES is based on the main features of the object-oriented data model. The system design is based on an internal model and on an external model. The internal model defines the relationships that bind the interface to the DBMS, and it is fully described in [Oli92]. The external model determines the possible interaction between the user and the interface system. This paper describes the concepts of the external model of the GOODIES system.

Biodiversity research requires associating data about living beings and their habitats, integrating from geographical features to domain specifications, often through ontologies. In this context are the so-called Biodiversity Information Systems, new management solutions that allow researchers to analyze species characteristics and their interactions. The goal of this project is to specify and develop an ontology web service that can be used for different biodiversity systems. The main contributions of this work are: specification of the requirements of an ontology service; and the specification and the implementation of an ontology server.

We present a simple data model that can be used as a building block in a comparative genomics information system for prokaryotic genomes. The model is extensible and flexible, and has as main entities the organism and the gene family. Existing systems tend to focus either on organisms or in gene families. We have applied the model to a set of eight bacterial genomes, and briefly describe the resulting system.

Scientists have traditionally shared data, experiments and research results. Now, they continue to do this via electronic networks and the Internet, but often without an appropriate framework. One possible approach to this problem is coordinating cooperation via scientific workflows on the Web. Our research contributes to these efforts in two directions: proposal of a model compliant with Web standards to store workflow components in databases and publish them on the Web; and development of a set of Web-based tools to specify, edit and compose workflow components.

Traceability and Provenance are often used interchangeably in eScience, being associated with the need scientists have to document their experiments, and so allow experiments to be checked and reproduced by others. These terms have, however, different meanings: provenance is more often associated with data origins, whereas traceability concerns the interlinking and execution of processes. This paper proposes a set of mechanisms to deal with this last aspect, the solution is based on database research combined with scientific workflows, plus domain-specific knowledge stored in ontology structures. This meets a need from bioinformatics laboratories, where the majority of computer systems do not support traceability facilities. These mechanisms have been implemented in a prototype, and an example using the genome assembly problem is given.
Web services represent a relevant technology for interoperability. An important step toward the development of applications based on Web services is the ability of selecting and integrating heterogeneous services from different sites. When there is no single service capable of performing a given task, there must be some way to adequately compose basic services to execute this task. The manual composition of Web services is complex and susceptible to errors because of the dynamic behavior and flexibility of the Web. This paper describes and compares AI planning solutions to Web service automatic composition. As a result of this comparison, it proposes an architecture that supports service composition, and which combines AI planning with workflow mechanisms.

There is a wide range of scientific application domains requiring sophisticated management of spatio-temporal data. However, existing database management systems offer very limited (if any at all) support for managing such data. Thus, it is left to the researchers themselves to repeatedly code this management into each application. Besides being a time consuming task, this process is bound to introduce errors and increase the complexity of application management and data evolution. This paper addresses this very point. We present an extensible framework, based on extending an object-oriented database system, with kernel spatio-temporal classes, data structures and functions, to provide support for the development of spatio-temporal applications. Even though the paper’s arguments are centered on geographic applications, the proposed framework can be used in other application domains where spatial and temporal data evolution must be considered (e.g., Biology).

Interoperability in GIS is an issue of growing importance, due to the increase in number and volume of available data sources and to the exponential expansion of new applications and systems. Research in this area involve solutions directed towards the different layers of an information system (interoperability based on common interface design, process interoperability or interoperability through data). The goal of this paper is to point out issues concerning interoperability at the data level. In particular, the text analyses issues related to the quality of geographic data as an additional dimension that must be taken into consideration in the cases of data migration and integration.
This paper presents a methodology to construct a federated database infrastructure to help the integration of heterogeneous data sources and which takes legacy data into account. This methodology considers different kinds of data sources and systems to be combined, and gives guidelines to integrate the data for each situation. The last step of the methodology consists in an algorithm that produces mappings from queries on the federated system to the set of queries on the database that participate in the federation. The methodology was validated by a case study on databases and legacy systems of the municipal administration of Paulinia, SP.
 This paper presents the modeling and implementation aspects of a metadata database for the information system of the BIOTA/FAPESP research program. This program's goal is the cooperation among biodiversity researchers in the State of Sao Paulo, Brazil, thus helping to maintain and create environmental protection programs within the Stat e. The information system, under development, shall integrate data from the different research groups and foster the dissemination of their work. This information system is unique in several aspects, including the diversity of data managed, and the spectrum of users. The metadata database is the system's component responsible for the high level description of the various biodiversity data gathered by the researchers. This paper concentrates on the metadata standard developed for this information system, and on the database implementation aspects

Technological changes impose a constant evolution on all kinds of artifacts, and require new solutions for their efficient maintenance. Appropriate documentation is considered fundamental for maintenance and evolution. This situation is even more crucial when one considers today's cooperative environments for designing and developing artifacts. Most of the time, documentation is static and describes what an artifact is, and sometimes how it was designed and constructed. Moreover, in collaborative work, documentation serves as one of the communication means among all involved in creating an artifact. However, several other types of documentation needs have been identified in many domains -- e.g., medicine, engineering, biology or astronomy -- such as flexible versioning for keeping track of an artifact's entire evolution, as well as documentation for the reasoning (the why) behind its construction. Unfortunately, no comprehensive system exists to handle all these documentation requirements: each kind of document is managed by a separate system, and furthermore studied in a different Computer Science field. what documentation may fall within database or software engineering research, whereas how is often restricted to hypermedia systems and CSCW, and why is handled in the context of Artificial Intelligence and cognitive science. This paper presents a unified framework to manage all these kinds of documents within a single database, for engineering artifacts. This allows integrating and coordinating the (cooperative) work of different types of users of these artifacts: designers, customers, salespeople, constructors. This eliminates the break in continuity found in normal environments, where each kind of documentation is handled separately and uses distinct implementation paradigms. Our framework is exemplified in the context of software module configuration.

Urban planning is one of the main areas that use geographic information systems. The base of the urban planning implementation is the Basic Urban Mapping (BUM), which is the set of graphic and alphanumeric informations related to the cartographic base of the cadastral plant. The BUM modeling depends on the necessity and purpose of the applications and the investment directed for this, besides different feelings of users. In this way, different applications in general don’t share their models during the development of their applications, increasing their cost. This paper describes an integration experience of two real life applications: Telebrás (Telecomunicações Brasileiras S/A) and Eletropaulo (Eletricidade de São Paulo S/A).

Content-Based Image Retrieval (CBIR) systems have been developed aiming at enabling users to search and retrieve images based on their properties such as shape, color and texture. In this paper, we are concerned with shapebased image retrieval. Here, we discuss a recently proposed shape descriptor, called contour saliences, defined as the influence areas of its higher curvature points. This paper introduces a robust approach to estimate contour saliences by exploiting the relation between a contour and its skeleton, modifies the original definition to include the location and the value of saliences along the contour, and proposes a new metric to compare contour saliences. The paper also evaluates the effectiveness of the proposed descriptor with respect to Fourier Descriptors, Curvature Scale Space and Moment Invariants.
The Semantic Web has become an active research area with many promising applications. This paper gives a concrete contribution to the adoption of Semantic Web technology in GIS, by describing the use of a domain ontology to help navigation on maps, and support the integration of geographic objects on the Web. The OntoCarta system, which we are developing to demonstrate our methods, relies on current standards and public domain tools to build a map navigator including: (1) a viewer for maps in different scales; (2) a domain ontology to describe and correlate maps’ objects. The combination of these components results in a knowledge directed cartographic navigation system. This system supports map zooming, while keeping contextual information for different levels of abstraction. The adoption of open formats to represent the domain ontology, allied to the consensual character of this ontology, enables the use of OntoCarta on a Web browser and fosters data reuse throughout the Internet
Crop forecast is an activity practiced by experts in agriculture, based on large data volumes. These data cover climatological information of the most diverse types, concerning a geographic region and the type of culture. Besides volume, another problem to face concerns data heterogeneity. This paper presents a project for development of a data management system for crop forecasts. The paper is centered in the management of pluviometric data, an important factor in crop management. The system is being implanted by Embrapa, the Brazilian Agricultural Research Corporation, and part of it is already available on the Web.

This paper presents a system developed at UNICAMP for automatically maintaining topological constraints in a geographic database. This system is based on extending to spatial data the notion of standard integrity maintenance through active databases. Topological relations, defined by the user, are transformed into spatial integrity constraints, which are stored in the database as production rules. These rules are used to maintain the corresponding set of topological relations, for all applications that use the database. This extends previous work on rules and \gis\ by incorporating the rules into the DBMS rather than having them handled by a separate module.

Protein clustering is widely used in order to characterize functionally proteins. Many automatics methods for protein-clustering use a graph-based approach. In this work, we propose a methodology for evaluation of the solution given by these methods.
GIS have become important tools in public planning activities (e.g, in enviromental or utility management). This type of activity requires the creation and management of alternative scenarios, as well as analysis of temporal data evolution. Existing systems provide limited support for these operations, and appropriate tools are yet to be developed. This paper presents a solution to this problem. This solution is based on managing temporal data and alternatives using the DBV version mechanism.
While workflow is playing an increasingly important role in e-Science, current systems lack support for the collection of provenance data. We argue that workflow provenance data should be automatically generated by the enactment engine and managed over time by an underlying storage service. We briefly describe our layered model for workflow execution provenance, which allows navigation from the conceptual model of an experiment to instance data collected during a specific experiment run, and back.
Bioinformatics activities present new challenges, such as how to exchange and reuse successful experimental procedures, tools and data, and how to understand and provide interoperability among data and tools across different sites, for distinct user profiles. This thesis is an effort towards these directions. It is based on combining research on databases, AI and scientific workflows, on the Semantic Web, to design, reuse, annotate and document bioinformatics experiments. The resulting framework allows the integration of heterogeneous data and tools and the design of experiments as scientific workflows, which are stored in databases. Moreover, it takes advantage of the notion of planning in AI to support automatic or interactive composition of tasks. These ideas are being implemented in a prototype and validated on real bioinformatics data.
Biodiversity research requires associating data about living beings and their habitats, integrating from geographical features to domain specifications, often through ontologies. In this context are the so-called Biodiversity Information Systems, new management solutions that allow researchers to analyze species characteristics and their interactions. The goal of this project is to specify and develop an ontology web service that can be used for different biodiversity systems. The main contributions of this work are: specification of the requirements of an ontology service; and the specification and the implementation of an ontology server. This research is directly connected with the first challenge (management of large multimedia data volumes), and provides support to research in challenge 2 (computational modeling in complex systems).
DNA fragment assembly is an area which makes intensive use of computers. However, computer users in this field are typically not experts in computer science, but build their working environment on an ad-hoc basis. In this situation, it seems appropriate to offer a kind of support which can contribute to a better organization of working environments, and a better exploitation of computer hardware and software. The authors describe an approach in this direction based on the emerging paradigms of workflow modeling and management. In particular they offer three contributions: first, they discuss why workflow management can be fruitfully adopted in DNA fragment assembly, and describe one way to perceive and model sequencing processes as workflows. Second, they outline an architecture of a system intended to support sequencing applications, whose core component is a workflow management system. Finally, they sketch their experience of building a prototype using commercial workflow management technology

This paper discusses ongoing research on scientific workflows at the Institute of Computing, University of Campinas (IC - UNICAMP) Brazil. Our projects with bio-scientists have led us to develop a scientific workflow infrastructure named WOODSS. This framework has two main objectives in mind: to help scientists to specify and annotate their models and experiments; and to document collaborative efforts in scientific activities. In both contexts, workflows are annotated and stored in a database. This annotated scientific workflow" database is treated as a repository of (sometimes incomplete) approaches to solving scientific problems. Thus, it serves two purposes: allows comparison of distinct solutions to a problem, and their designs; and provides reusable and executable building blocks to construct new scientific workflows, to meet specific needs. Annotations, moreover, allow further insight into methodology, success rates, underlying hypotheses and other issues in experimental activities. The many research challenges faced by us at the moment include: the extension of this framework to the Web, following Semantic Web standards; providing means of discovering workflow components on the Web for reuse; and taking advantage of planning in Artificial Intelligence to support composition mechanisms. This paper describes our e orts in these directions, tested over two domains agro-environmental planning and bioinformatics.
Bioinformatics activities are growing all over the world, with proliferation of data and tools. This brings new challenges: how to understand and organize these resources and how to provide interoperability among tools to achieve a given goal. We defined and implemented a framework to help meet some of these challenges. Four issues were considered: the use of Web services as a basic unit, the notion of a Semantic Web to improve interoperability at the syntactic and semantic levels, and the use of scientific workflows to coordinate services to be executed, including their interdependencies and service orchestration.
We report novel features of the genome sequence of Leptospira interrogans serovar Copenhageni, a highly invasive spirochete. Leptospira species colonize a significant proportion of rodent populations worldwide and produce life-threatening infections in mammals. Genomic sequence analysis reveals the presence of a competent transport system with 13 families of genes encoding for major transporters including a three-member component efflux system compatible with the long-term survival of this organism. The leptospiral genome contains a broad array of genes encoding regulatory system, signal transduction and methyl-accepting chemotaxis proteins, reflecting the organism's ability to respond to diverse environmental stimuli. The identification of a complete set of genes encoding the enzymes for the cobalamin biosynthetic pathway and the novel coding genes related to lipopolysaccharide biosynthesis should bring new light to the study of Leptospira physiology. Genes related to toxins, lipoproteins and several surface-exposed proteins may facilitate a better understanding of the Leptospira pathogenesis and may serve as potential candidates for vaccine.
Environmental planning requires constant tracing and revision of activities. Planners must be provided with appropriate documentation tools to aid communication among them and support plan enactment, revision and evolution. Moreover, planners often work in distinct institutions, thus these supporting tools must interoperate in distributed environments and in a semantically coherent fashion. Since semantics are strongly related to use, documentation also enhances the ways in which users can cooperate. The emergence of the Semantic Web created the need for documenting Web data and processes, using specific standards. This paper addresses this problem, for two issues: (1) ways of documenting planning processes, in three different aspects: what was done, how it was done and why it was done that way; and (2) a framework that supports the management of those documents using Semantic Web standards.
This paper presents a new formalism for workflow process definition, which combines research in programming languages and in database systems. This formalism is based on creating a library of workflow building blocks, which can be progressively combined and nested to construct complex workflows. Workflows are specified declaratively, using a simple high level language, which allows the dynamic definition of exception handling and events, as well as dynamically overriding workflow definition. This ensures a high degree of flexibility in data and control flow specification, as well as in reuse of workflow specifications to construct other workflows. The resulting workflow execution environment is well suited to supporting cooperative work.
This paper describes the POESIA approach to systematic composition of Web services. This pragmatic approach is strongly centered in the use of domain-specific multidimensional ontologies. Inspired by applications needs and founded on ontologies, workflows, and activity models, POESIA provides well-defined operations (aggregation, specialization, and instantiation) to support the composition of Web services. POESIA complements current proposals for Web services definition and composition by providing a higher degree of abstraction with verifiable consistency properties. We illustrate the POESIA approach using a concrete application scenario in agroenvironmental planning.
This paper presents two shape descriptors, multiscale fractal dimension and contour saliences, using a graph-based approach--- the image foresting transform. It introduces a robust approach to locate contour saliences from the relation between contour and skeleton. The contour salience descriptor consists of a vector, with salience location and value along the contour, and a matching algorithm. We compare both descriptors with fractal dimension, Fourier descriptors, moment invariants, Curvature Scale Space and Beam Angle Statistics regarding to their invariance to object characteristics that belong to a same class (compact-ability) and to their ability to separate objects of distinct classes (separability).
Biodiversity Information Systems (BISs) involve all kinds of heterogeneous data, which include ecological and geographical features. However, available information systems offer very limited support for managing these kinds of data in an integrated fashion. Furthermore, such systems do not fully support image content (e.g., photos of landscapes or living organisms) management, a requirement of many BIS end-users. In order to meet their needs, these users -- e.g., biologists, environmental experts -- often have to alternate between separate biodiversity and image information systems to combine information extracted from them. This hampers the addition of new data sources, as well as cooperation among scientists. The approach provided in this paper to meet these issues is based on taking advantage of advances in digital library innovations to integrate networked collections of heterogeneous data. It focuses on creating the basis for a next-generation BIS, combining new techniques of content-based image retrieval and database query processing mechanisms. This paper shows the use of this component-based architecture to support the creation of two tailored BIS systems dealing with fish specimen identification using search techniques. Experimental results suggest that this new approach improves the effectiveness of the fish identification process, when compared to the traditional key-based method.
The State University of Campinas (UNICAMP) is one of Brazil’s foremost universities, being responsible for 15% of the country’s scientific publications. Every year, over 50,000 students from all over the country apply to enter one of the university’s 60 undergraduate courses, and only 10% meet the strict entrance examinations. Created over 35 years ago to be a research-oriented university, half of its 30,000 students are enrolled in graduate programs, and the University awards every year 1000 Masters and 700 PhD degrees. Still another 14,000 people are enrolled in continuing education courses. This student body is taught by 1800 faculty, 97% of which have PhD degrees. This profile of student and faculty qualification, allied to good research facilities, provide very good opportunities for innovative research, involving both graduate and undergraduate students. Remote sensing (RS) research, by nature multidisciplinary, has found in UNICAMP a good environment to flourish. Several laboratories conduct work on different aspects of the use of this technology, involving faculty with distinct profiles. Rather than one single center dedicated to RS aspects, several laboratories develop initiatives in this area, with distinct application domains in mind. This paper gives a brief overview of the work conducted along two distinct domains – agriculture and geology – with projects resulting from cooperation of experts in computer science and in the study and application of RS to these domains. It must be stressed that other groups in the university also conduct work involving remote sensing technology – e.g., for biodiversity analysis – but this paper presents a good sample of relevant ongoing projects. The authors work in four distinct laboratories, but collaborate in various research and training activities. As will be seen, a few of the projects described in the sections that follow involve people from all laboratories concerned.
This work exploits the resemblance between content-based image retrieval and image analysis with respect to the design of image descriptors and their effectiveness. In this context, two shape descriptors are proposed: contour saliences and segment saliences. Contour saliences revisits its original definition, where the location of concave points was a problem, and provides a robust approach to incorporate concave saliences. Segment saliences introduces salience values for contour segments, making it possible to use an optimal matching algorithm as distance function. The proposed descriptors are compared with convex contour saliences, curvature scale space, and beam angle statistics using a fish database with 11,000 images organized in 1100 distinct classes. The results indicate segment saliences as the most effective descriptor for this particular application and confirm the improvement of the contour salience descriptor in comparison with convex contour saliences.
To investigate whether image analysis of routine hematoxylin-eosin (H-E) skin sections using fast Fourier transformation (FFT) could detect structural alterations in patients with Sjogren-Larsson syndrome (SLS) diagnosed by molecular biology. STUDY DESIGN: Skin punch biopsies of 9 patients with SLS and 17 healthy volunteers were obtained. Digital images of routine histologic sections were taken, and their gray scale luminance was analyzed by FFT. The inertia values were determined for different ranges of the spatial frequencies in the vertical and horizontal direction. To get an estimation of anisotropy, we calculated the resultant vector of the designated frequency ranges. RESULTS: In the prickle cell layer, SLS patients showed more intense amplitudes in spatial structures with periods between 1.2 and 3.6 microm in the vertical direction, which correlated in part with accentuated nuclei and nucleoli and perinucleolar halos in the H-E sections. In a linear discriminant analysis, the variables derived from the FFT images correctly discriminated 84.6% of the patients. Texture features derived from the gray level cooccurrence matrix were not able to separate the groups. CONCLUSION: Exploratory texture analysis by FFT was able to detect discrete alterations in the prickle cell layer in routine light microscopy slides of SLS patients. The structural changes identified by FFT may be related to abnormal cellular components associated with aberrant lipid metabolism.
The polyanionic collagen-elastin matrices (PCEMs) are osteoconductive scaffolds that present high biocompatibility and efficacy in the regeneration of bone defects. In this study, the objective was to determine if these matrices are directly mineralized during the osteogenesis process and their influence in the organization of the new bone extracellular matrix. Samples of three PCEMs, differing in their charge density, were implanted into critical-sized calvarial bone defects created in rats and evaluated from 3days up to 1 year after implantation. The implanted PCEMs were directly biomineralized by osteoblasts as shown by ultrastructural, histoenzymologic, and morphologic analysis. The removal of the implants occurred during the bone remodeling process. The organization of the new bone matrix was evaluated by image texture analysis determining the Shannon's entropy and the fractal dimension of digital images. The bone matrix complexity decreased as the osteogenesis progressed approaching the values obtained for the original bone structure. These results show that the PCEMs allow faster formation of new bone by direct biomineralization of its structure and skipping the biomaterial resorption phase.
The fractal nature of the DNA arrangement has been postulated to be a common feature of all cell nuclei. We investigated the prognostic importance of the fractal dimension (FD) of chromatin in blasts of patients with acute precursor B lymphoblastic leukemia (B-ALL). In 28 patients, gray scale transformed pseudo-3D images of 100 nuclei (May-Grünwald-Giemsa stained bone marrow smears) were analyzed. FD was determined by the Minkowski-Bouligand method extended to three dimensions. Goodness-of-fit of FD was estimated by the R2 values in the log-log plots. Whereas FD presented no prognostic relevance, patients with higher R2 values showed a prolonged survival. White blood cell count (WBC), age and mean fluorescence intensity of CD45 (MFICD45) were all unfavorable prognostic factors in univariate analyses. In a multivariate Cox-regression, R2, WBC, and MFICD45, entered the final model, which showed to be stable in a bootstrap resampling study. Blasts with lower R2 values, equivalent to accentuated "coarseness" of the chromatin pattern, which may reflect profound changes of the DNA methylation, indicated a poor prognosis. In conclusion the goodness-of-fit of the Minkowski-Bouligand dimension of chromatin can be regarded as a new and biologically relevant prognostic factor for patients with B-ALL.
The Web is evolving from a space for publication/consumption of documents to an environment for collaborative work, where digital content can travel and be replicated, adapted, decomposed, fusioned, and transformed. We call this the Fluid Web perspective. This view requires a thorough revision of the typical document-oriented approach that permeates content management on the Web. This paper presents our solution for the Fluid Web, which allows moving from the document-oriented to a content-oriented perspective, where "content” can be any digital object. The solution is based on two axes: a self-descriptive unit to encapsulate any kind of content artifact—the Digital Content Component (DCC) and a Fluid Web infrastructure that provides management and deployment of DCCs through the Web, and whose goal is to support collaboration on the Web. Designed to be reused and adapted, DCCs encapsulate data and software using a single structure, thus allowing homogeneous composition and processing of any digital content, executable or not. These properties are exploited by our Fluid Web infrastructure, which supports DCC multilevel annotation and discovery mechanisms, configuration management, and version control. Our work extensively explores taxonomic ontologies and Semantic Web standards, which serve as a semantic bridge, unifying DCC management vocabularies, and improving DCC description/indexing/discovery. DCCs and infrastructure have been implemented and are illustrated by means of a running example, for a scientific application.
Cet article présente la démarche multidisciplinaire que nous avons adoptée pour construire un système d’information pour l’aide à la décision dans la gestion du trafic routier. L’architecture du système, le schéma de l’entrepôt de données ainsi que les différentes représentations numériques et symboliques des séquences spatio-temporelles, stockées dans l’entrepôt, y sont détaillés.
We present a simple data model that can be used as a building block in a comparative genomics information system for prokaryotic genomes. The model is extensible and flexible, and has as main entities the organism and the gene family. Existing systems tend to focus either on organisms or in gene families. We have applied the model to a set of eight bacterial genomes, and briefly describe the resulting system.
The advances of multimedia models and tools popularized the access and production of multimedia contents: in this new scenario, there is no longer a clear distinction between authors and end-users of a production. These user-authors often work in a collaborative way. As end-users, they collectively participate in interactive environments, consuming multimedia artifacts. In their authors’ role, instead of starting from scratch, they often reuse others’ productions, which can be decomposed, fusioned and transformed to meet their goals. Since the need for sharing and adapting productions is felt by many communities, there has been a proliferation of standards and mechanisms to exchange complex digital objects, for distinct application domains. However, these initiatives have created another level of complexity, since people have to define which share/ reuse solution they want to adopt, and may even have to resort to programming tasks. They also lack effective strategies to combine these reused artifacts. This paper presents a solution to this demand, based on a userauthor centered multimedia building block model—the digital content component (DCC). DCCs upgrade the notion of digital objects to digital components, as they homogenously wrap any kind of digital content (e.g., multimedia artifacts, software) inside a single component abstraction. The model is fully supported by a software infrastructure, which exploits the model’s semantic power to automate low level technical activities, thereby freeing user-authors to concentrate on creative tasks. Model and infrastructure improve recent research initiatives to standardize the means of sharing and reuse domain specific digital contents. The paper’s contributions are illustrated using examples implemented in a DCC-based authoring tool, in real life situations.
Bioinformatics activities are growing all over the world, with proliferation of data and tools. This brings new challenges, such as how to understand and organize these resources, how to exchange and reuse successful experimental procedures, tools and data, and how to provide interoperability among data and tools across different sites, and for distinct user profiles. This paper describes an effort toward these directions. It is based on combining research on ontology management, AI and scientific workflows, on the Semantic Web, to design, reuse, annotate and document bioinformatics experiments. The resulting framework takes advantage of ontologies to support the specification and annotation of bioinformatics workflows, and to serve as the basis for tracking data provenance. Moreover, it uses AI planning techniques to support automatic or interactive composition of tasks. These ideas have been implemented in a prototype and validated on real bioinformatics data.
For the first provenance challenge, we introduce a layered model to represent workflow provenance that allows navigation from an abstract model of the experiment to instance data collected during a specificexperiment run. We outline modest extensions to a commercial workflow engine so it will automatically capture provenance at workflow runtime. We also present an approach to store this provenance data in a relational database. Finally, we demonstrate how core provenance queries in the challenge can be expressed in SQL and discuss the merits of our layered representation. Copyright © 2007 John Wiley & Sons, Ltd.

Environmental planners take advantage of Spatial Decision Support Systems (SDSS) to deal with data and models for problem solving. However, these kinds of software usually provide generic models, which require considerable effort to be specialized to fit particular situations. This paper explores a solution which couples Case-Based Reasoning (CBR) to an existing SDSS, named WOODSS, to help planners to profit from others' experiences. WOODSS is based on a Geographic Information System, and interactively documents planners' modeling activities by means of scientific workflows, that are stored in a database. This paper described how CBR has been used as part of WOODSS' retrieval and storage mechanisms, to identify similar models to reuse in new decision processes. This adds a new dimension to the functionality of available SDSS.

Computational analyses of four bacterial genomes of the Xanthomonadaceae family reveal new unique genes that may be involved in adaptation, pathogenicity, and host specificity. The Xanthomonas genus presents 3636 unique genes distributed in 1470 families, while Xylella genus presents 1026 unique genes distributed in 375 families. Among Xanthomonas-specific genes, we highlight a large number of cell wall degrading enzymes, proteases, and iron receptors, a set of energy metabolism genes, second copy of the type II secretion system, type III secretion system, flagella and chemotactic machinery, and the xanthomonadin synthesis gene cluster. Important genes unique to the Xylella genus are an additional copy of a type IV pili gene cluster and the complete machinery of colicin V synthesis and secretion. Intersections of gene sets from both genera reveal a cluster of genes homologous to Salmonella's SPI-7 island in Xanthomonas axonopodis pv citri and Xylella fastidiosa 9a5c, which might be involved in host specificity. Each genome also presents important unique genes, such as an HMS cluster, the kdgT gene, and O-antigen in Xanthomonas axonopodis pv citri; a number of avrBS genes and a distinct O-antigen in Xanthomonas campestris pv campestris, a type I restriction-modification system and a nickase gene in Xylella fastidiosa 9a5c, and a type II restriction-modification system and two genes related to peptidoglycan biosynthesis in Xylella fastidiosa temecula 1. All these differences imply a considerable number of gene gains and losses during the divergence of the four lineages, and are associated with structural genome modifications that may have a direct relation with the mode of transmission, adaptation to specific environments and pathogenicity of each organism.