More than one billion wireless devices are sold every year. This enormous volume, in combination with the hardware and software complexity of the devices, has given rise to an uncountable number of technical advances. Seemingly small changes of pennies per unit in manufacturing costs, or a slip in the delivery schedule, can add up to substantial dollar amounts when combined with the sheer volume of devices to be manufactured.
One technology gaining widespread acceptance within the wireless design community is the use of virtual prototypes throughout the design cycle. Wireless engineers are leveraging virtual prototypes of their system-on-chip designs to improve product quality and speed time to market. A virtual prototype can be an indispensable tool for performing early architectural analysis for throughput and power tradeoffs. Firmware developers can use virtual prototypes to develop and debug their software in advance of real silicon. In addition, virtual prototypes can be used to optimize the throughput of designs that have been built already.
Moving beyond spreadsheets
Until recently, architectural exploration was relatively ad hoc. Back-of-the-envelope calculations, combined with a few spreadsheets and years of design expertise, comprised the entire architectural flow for many chips and systems. This methodology is elegant in its simplicity but fails to deliver the architectural certainty required by most wireless apps.
A poorly designed architecture can present itself in many ways. The most visible evidence is that it doesn't meet its performance targets. Underperforming chips can sometimes be revived by creative firmware engineers and extra months in the lab. More often than not, slow chips are relegated to the trash heap.
The more common, and far less obvious, architectural failing is overdesign. The worst-case conditions of various spreadsheet components are added together--including those that would never occur together in the real world--and the architect generates a design exceeding specifications.
Overdesigned architectures have hidden costs associated with them. Expensive, high-speed memory may be used in cases where less expensive, slower components might have sufficed. The main processor clock running at 400 MHz might have run fine at 300 MHz instead. The ultimate price for an overdesigned architecture is typically seen in an end product that costs more and has a shorter battery life than competitive products. The problems resulting from overdesign may not be as obvious as those from underdesign, but the end result is not much different.
Virtual prototypes remove the guesswork from architectural decisions because the architect can explore multiple design scenarios and get an in-depth understanding of the real-world impacts of various design decisions and intellectual property (IP) selections. Analysis tools can be used to display such critical items as throughput, loading and latency. These results can be examined on a cycle-by-cycle basis and directly correlated with various hardware and software components. The data available from a virtual prototype can enable an architect to confidently make big decisions, such as which IP block to select, as well as seemingly smaller ones, such as what arbitration scheme to employ.
Design And Reuse
Thursday, October 30, 2008
1394-2008 High Performance Serial Bus Standard
The IEEE Standards Board recently approved the 1394-2008 specification. 1394-2008 combines all IEEE-1394 specifications developed since the audio-video multimedia standard was founded in 1994. The 1394-2008 High Performance Serial Bus Standard updates and revises all prior 1394 standards dating back to the original 1394-1995 version, and including 1394a, 1394b, 1394c, enhanced UTP, and the 1394 beta plus PHY-Link interface. It also incorporates the complete specifications for S1600 (1.6 Gigabit/second bandwidth) and for S3200, which provides 3.2 Gigabit/second speeds.
IEEE-1394, also known as FireWire and i.LINK, has been designed into a wide range of consumer, computer, industrial and other products since its inception, and is emerging as a powerful new standard for use in automotive entertainment systems.
The team dealt with errata remaining from prior specifications, and harmonized all message types, including fields that had been used in related specifications such as the 1394.1 bridging specification, and IDB-1394, which was developed as the original automotive entertainment standard. Not incorporated is work currently underway within the 1394 Trade Association working groups, including 1394 over coax and the new 1394-Automotive specification due later this summer.
IEEE-1394, also known as FireWire and i.LINK, has been designed into a wide range of consumer, computer, industrial and other products since its inception, and is emerging as a powerful new standard for use in automotive entertainment systems.
The team dealt with errata remaining from prior specifications, and harmonized all message types, including fields that had been used in related specifications such as the 1394.1 bridging specification, and IDB-1394, which was developed as the original automotive entertainment standard. Not incorporated is work currently underway within the 1394 Trade Association working groups, including 1394 over coax and the new 1394-Automotive specification due later this summer.
How to improve FPGA-based ASIC prototyping with SystemVerilog
Introduction
ASICs provide a solution for capturing high performance complex design concepts and preventing competitors from simply implementing comparable designs.
However, creating an ASIC is a high-investment proposition with development costs approaching $20M for a 90 nm ASIC/SoC design and expected to top $40M for a 45 nm SoC. Thus, increasingly, only a high-volume product can afford an ASIC.
Besides the increase in mask-set cost, total development cost is also increasing due to the reduced probability of getting the design right the first time. As design complexity continues to increase, surveys have shown that only about a third of today's SoC designs are bug-free in first silicon, and nearly half of all respins are reported as being caused by functional logic error(s). As a result, verification managers are now exploring ways to strengthen their functional verification methodologies.
Before starting on a true ASIC design, to demonstrate that concepts are sound and that designs can be implemented, a lower-cost method of using FPGAs to prototype ASIC designs as part of an ASIC verification methodology has been growing in popularity.
Prototyping ASIC designs in FPGAs, while often yielding different performance, often results in the same logical functionality. Further, running a design at speed on an FPGA prototype with real stimulus allows for a far more exhaustive and realistic functional coverage as well as early integration with embedded software. Thus FPGA prototyping can be used effectively to supplement and extend existing functional verification methodologies.
As ASIC designs have grown larger at a much faster pace than FPGA devices, often multiple FPGA devices must be used to prototype a single ASIC. The obstacle of using multiple devices is the task of connecting all of the logical blocks of the ASIC design across multiple FPGA devices. Physically, with the use of the high speed I/O blocks in FPGA devices, connectivity between physical devices has been simplified. However, methods for logically connecting the design blocks have proven to be manually intensive and error prone. With the introduction of SystemVerilog, an evolutionary RTL language, and advanced mixed language synthesis tools such as Mentor Graphics' Precision Synthesis, the procedure for connection has also been simplified.
ASICs provide a solution for capturing high performance complex design concepts and preventing competitors from simply implementing comparable designs.
However, creating an ASIC is a high-investment proposition with development costs approaching $20M for a 90 nm ASIC/SoC design and expected to top $40M for a 45 nm SoC. Thus, increasingly, only a high-volume product can afford an ASIC.
Besides the increase in mask-set cost, total development cost is also increasing due to the reduced probability of getting the design right the first time. As design complexity continues to increase, surveys have shown that only about a third of today's SoC designs are bug-free in first silicon, and nearly half of all respins are reported as being caused by functional logic error(s). As a result, verification managers are now exploring ways to strengthen their functional verification methodologies.
Before starting on a true ASIC design, to demonstrate that concepts are sound and that designs can be implemented, a lower-cost method of using FPGAs to prototype ASIC designs as part of an ASIC verification methodology has been growing in popularity.
Prototyping ASIC designs in FPGAs, while often yielding different performance, often results in the same logical functionality. Further, running a design at speed on an FPGA prototype with real stimulus allows for a far more exhaustive and realistic functional coverage as well as early integration with embedded software. Thus FPGA prototyping can be used effectively to supplement and extend existing functional verification methodologies.
As ASIC designs have grown larger at a much faster pace than FPGA devices, often multiple FPGA devices must be used to prototype a single ASIC. The obstacle of using multiple devices is the task of connecting all of the logical blocks of the ASIC design across multiple FPGA devices. Physically, with the use of the high speed I/O blocks in FPGA devices, connectivity between physical devices has been simplified. However, methods for logically connecting the design blocks have proven to be manually intensive and error prone. With the introduction of SystemVerilog, an evolutionary RTL language, and advanced mixed language synthesis tools such as Mentor Graphics' Precision Synthesis, the procedure for connection has also been simplified.
Taking a closer look at Intel's Atom multicore processor architecture
Multi-core processors are everywhere. In desktop computing, it is almost impossible to buy a computer today that doesn't have a multi-core CPU inside. Multi-core technology is also having an impact in the embedded space, where increased performance per Watt presents a compelling case for migration.
Developers are increasingly turning to multi-core because they either want to improve the processing power of their product, or they want to take advantage of some other technology that is 'bundled' within with the multi-core package. Because this new parallel world can also represent an engineering challenge, this article offers seven tips to help ease those first steps towards using these devices.
It's not unnatural to want to use the latest technology in our favourite embedded design. It is tempting to make a design a technological showcase, using all the latest knobs, bells and whistles. However, it is worth reminding ourselves that what is fashion today will be 'old hat' within a relatively short period. If you have an application that works well the way it is, and is likely to keep performing adequately within the lifetime of the product, then maybe there is no point in upgrading.
One of the benefits of recent trends within processor design has been the focus on power efficiency. Prior to the introduction of multi-core, new performance barriers were reached by providing silicon that could run at ever higher clock speeds. An unfortunate by-product of this speed race was that the heat dissipated from such devices made them unsuitable for many embedded applications.
As clock speeds increased, the limits of the transistor technology physics were moving ever closer. Researchers looked for new ways to increase performance without further increasing power consumption. It was discovered that by turning down the clock speeds and then adding additional cores to a processor, it was possible to get a much improved performance per Watt measurement.
The introduction of multi-core, along with new gate technologies and a redesign of the most power-hungry parts of a CPU, has led to processors that use significantly less power, yet deliver greater raw processing performance than their antecedents.
An example is the Intel Atom, a low power IA processor which uses 45nm Hi-K transistor gates. By implementing an in-order pipeline, adding additional deep sleep states, supporting SIMD (Single Instruction Multiple Data) instructions and using efficient instruction decoding and scheduling, Intel has produced a powerful but not power-hungry piece of silicon. Taking advantage of the lower power envelope could in itself be a valid reason for using multi-core devices in an embedded design " even if the target application is still single-threaded.
Multi-core processors are everywhere. In desktop computing, it is almost impossible to buy a computer today that doesn't have a multi-core CPU inside. Multi-core technology is also having an impact in the embedded space, where increased performance per Watt presents a compelling case for migration.
Developers are increasingly turning to multi-core because they either want to improve the processing power of their product, or they want to take advantage of some other technology that is 'bundled' within with the multi-core package. Because this new parallel world can also represent an engineering challenge, this article offers seven tips to help ease those first steps towards using these devices.
It's not unnatural to want to use the latest technology in our favourite embedded design. It is tempting to make a design a technological showcase, using all the latest knobs, bells and whistles. However, it is worth reminding ourselves that what is fashion today will be 'old hat' within a relatively short period. If you have an application that works well the way it is, and is likely to keep performing adequately within the lifetime of the product, then maybe there is no point in upgrading.
One of the benefits of recent trends within processor design has been the focus on power efficiency. Prior to the introduction of multi-core, new performance barriers were reached by providing silicon that could run at ever higher clock speeds. An unfortunate by-product of this speed race was that the heat dissipated from such devices made them unsuitable for many embedded applications.
As clock speeds increased, the limits of the transistor technology physics were moving ever closer. Researchers looked for new ways to increase performance without further increasing power consumption. It was discovered that by turning down the clock speeds and then adding additional cores to a processor, it was possible to get a much improved performance per Watt measurement.
The introduction of multi-core, along with new gate technologies and a redesign of the most power-hungry parts of a CPU, has led to processors that use significantly less power, yet deliver greater raw processing performance than their antecedents.
An example is the Intel Atom, a low power IA processor which uses 45nm Hi-K transistor gates. By implementing an in-order pipeline, adding additional deep sleep states, supporting SIMD (Single Instruction Multiple Data) instructions and using efficient instruction decoding and scheduling, Intel has produced a powerful but not power-hungry piece of silicon. Taking advantage of the lower power envelope could in itself be a valid reason for using multi-core devices in an embedded design " even if the target application is still single-threaded.
Use advanced architectural extensions
All the latest generation of CPUs have various architectural extensions that are there for 'free' and should be taken advantage of. One very effective but often underused extension is support for SIMD - that is, doing several calculations in one instruction.
Often developers ignore these advanced operations because of the perceived effort of adding such instructions to application code. While it is possible to use these instructions by adding macros, inline assembler or dedicated library functions to the application code, a favourite of many developers is to rely on the compiler to automatically insert such instruction in the generated code.
One technique known as 'auto-vectorisation' can lead to a significant performance boost of an application. In this technique the compiler looks for calculations that are performed in a loop. By replacing such calculations with, say, Streaming SIMD Extension (SSE) instructions, the compiler effectively reduces the number of loop iterations required. Some developers have seen their applications run twice as fast by turning on auto-vectorisation in the compiler.
Like the power gains of the previous section, using these architectural extensions may be a valid reason in itself for using a multi-core processor, even if you are not developing threaded code.
Not all programs are good candidates for parallelism. Even if your program seems to need a 'parallel facelift', it does not necessarily follow that going multi-core will help you. For example, say your product is an application running real-time weather pattern simulations, based on data collected from a number of remote sensors.
The measurements of wind speed, direction, temperature and humidity are being used to calculate the weather pattern over the next 30 minutes. Imagine that the application always produces its calculation results too late, and the longer the application runs the worse the timeliness of the simulation is.
One could assume that the poor performance is because the CPU is not powerful enough to do the calculations in time. Going parallel might be the right solution " but how do we prove this? Of course, it could be that the real bottleneck is an IO problem, the reason for the poor application performance being the implementation of the remote data collection and not excessive CPU load.
<>There are a number of profiling tools available that can help form a correct picture of the running program. Such analysers typically rely on runtime architectural events that are generated by the CPU. Before you migrate your application to multi-core, it would be worth analysing the application with such a tool, using the information you glean to help in the decision making process.
There are different ways that one can introduce parallelism into the high-level design of a program. Three common strategies available are functional parallelism, data parallelism and software pipe-lining.In functional parallelism, each task or thread is allocated a distinct job; for example one thread might be reading a temperature transducer, while another thread is carrying out a series of CPU intensive calculations.
In data parallelism, each task or thread carries out the same type of activity. For example, a large matrix multiplication can be shared between, say, four cores, thus reducing the time taken to perform that calculation by a factor of four.
A software pipeline is somewhat akin to a production line, where a series of workers carry out a specific duty before passing the work onto the next worker in the production line. In a multi-core environment, each worker " or pipeline " is assigned to a different core. In traditional parallel programming, much emphasis is laid on the scalability of an application. Good scalability implies that a program running on a dual-core processor would run twice as fast on a quad-core.
In embedded systems, computing scalability is less important because the execution of the end product tends not to be changed; the shelf-life of the end product usually being measured in years rather than months. It may be that when moving to multi-core, the embedded engineer should not be over-sensitive to the scalability of his design, but rather use a combination of data and functional parallelism that delivers the best performance.
Using high-level constructs
Threading is not a new discipline and most operating systems have an API that allows the programmer to create and manage threads. Using the APIs directly in the code is quite tough, so the recommendation is to use a higher level of abstraction. One way of implementing threading is to use various high-level constructs or extensions to the programming language.
OpenMP is a pragma-based language extension for C/C++ and FORTRAN that allows the programmer to very easily introduce parallelism into an existing program. The standard has been adopted by a number of compiler vendors including GNU, Intel, and Microsoft.
A full description of the standard can be found at www.openmp.org With OpenMP it is easy to incrementally add parallelism to a program. Because the programming is pragma based, your code can still be built on compilers that don't support OpenMP " the compiler in this case would just issue a warning that it has found an unsupported pragma.
As stated earlier, functional parallelism is potentially more interesting than data parallelism when developing an embedded application. An alternative to using OpenMP is to use one of the newly emerging language extensions which supply similar functionality. It is expected that eventually such language extensions will be adopted by an appropriate standards committee. An experimental compiler with such extensions can be found at www.whatif.intel.com.
Another approach to traditional programming languages is to use a graphical development environment. There are a number of 'program by drawing' development tools that take care of all the low level threading implementation for the developer.
One example is National Instruments' LabVIEW, which allows the programmer to design his program diagrammatically, by connecting a number of objects together. Support for multi-core is simply adding a loop block to the diagram.
When programs run in parallel, they can be very difficult to debug " especially when using tools that are not enabled for parallelism. Identifying and debugging issues related to using shared resources and shared variables, synchronisation between different threads and dealing with deadlocks and livelocks are notoriously difficult.
However, there is a now a growing number of tools available from different vendors, specifically designed to aid debugging and tuning of parallel applications. The Intel Thread Checker and Intel Thread Profiler are examples of tools that can be can be used to debug and tune parallel programs
Where no parallel debugging tools are available for the embedded target you are working on, it is a legitimate practice to use standard desktop tools, carrying out the first set of tests on a desktop rather than the embedded target. It's a common experience that threading issues appearing on the target can often be first captured by running the application code on a desktop machine.
Stephen Blair-Chappell is a Technical Consulting Engineer at Intel Compiler Labs.
Developers are increasingly turning to multi-core because they either want to improve the processing power of their product, or they want to take advantage of some other technology that is 'bundled' within with the multi-core package. Because this new parallel world can also represent an engineering challenge, this article offers seven tips to help ease those first steps towards using these devices.
It's not unnatural to want to use the latest technology in our favourite embedded design. It is tempting to make a design a technological showcase, using all the latest knobs, bells and whistles. However, it is worth reminding ourselves that what is fashion today will be 'old hat' within a relatively short period. If you have an application that works well the way it is, and is likely to keep performing adequately within the lifetime of the product, then maybe there is no point in upgrading.
One of the benefits of recent trends within processor design has been the focus on power efficiency. Prior to the introduction of multi-core, new performance barriers were reached by providing silicon that could run at ever higher clock speeds. An unfortunate by-product of this speed race was that the heat dissipated from such devices made them unsuitable for many embedded applications.
As clock speeds increased, the limits of the transistor technology physics were moving ever closer. Researchers looked for new ways to increase performance without further increasing power consumption. It was discovered that by turning down the clock speeds and then adding additional cores to a processor, it was possible to get a much improved performance per Watt measurement.
The introduction of multi-core, along with new gate technologies and a redesign of the most power-hungry parts of a CPU, has led to processors that use significantly less power, yet deliver greater raw processing performance than their antecedents.
An example is the Intel Atom, a low power IA processor which uses 45nm Hi-K transistor gates. By implementing an in-order pipeline, adding additional deep sleep states, supporting SIMD (Single Instruction Multiple Data) instructions and using efficient instruction decoding and scheduling, Intel has produced a powerful but not power-hungry piece of silicon. Taking advantage of the lower power envelope could in itself be a valid reason for using multi-core devices in an embedded design " even if the target application is still single-threaded.
Multi-core processors are everywhere. In desktop computing, it is almost impossible to buy a computer today that doesn't have a multi-core CPU inside. Multi-core technology is also having an impact in the embedded space, where increased performance per Watt presents a compelling case for migration.
Developers are increasingly turning to multi-core because they either want to improve the processing power of their product, or they want to take advantage of some other technology that is 'bundled' within with the multi-core package. Because this new parallel world can also represent an engineering challenge, this article offers seven tips to help ease those first steps towards using these devices.
It's not unnatural to want to use the latest technology in our favourite embedded design. It is tempting to make a design a technological showcase, using all the latest knobs, bells and whistles. However, it is worth reminding ourselves that what is fashion today will be 'old hat' within a relatively short period. If you have an application that works well the way it is, and is likely to keep performing adequately within the lifetime of the product, then maybe there is no point in upgrading.
One of the benefits of recent trends within processor design has been the focus on power efficiency. Prior to the introduction of multi-core, new performance barriers were reached by providing silicon that could run at ever higher clock speeds. An unfortunate by-product of this speed race was that the heat dissipated from such devices made them unsuitable for many embedded applications.
As clock speeds increased, the limits of the transistor technology physics were moving ever closer. Researchers looked for new ways to increase performance without further increasing power consumption. It was discovered that by turning down the clock speeds and then adding additional cores to a processor, it was possible to get a much improved performance per Watt measurement.
The introduction of multi-core, along with new gate technologies and a redesign of the most power-hungry parts of a CPU, has led to processors that use significantly less power, yet deliver greater raw processing performance than their antecedents.
An example is the Intel Atom, a low power IA processor which uses 45nm Hi-K transistor gates. By implementing an in-order pipeline, adding additional deep sleep states, supporting SIMD (Single Instruction Multiple Data) instructions and using efficient instruction decoding and scheduling, Intel has produced a powerful but not power-hungry piece of silicon. Taking advantage of the lower power envelope could in itself be a valid reason for using multi-core devices in an embedded design " even if the target application is still single-threaded.
Use advanced architectural extensions
All the latest generation of CPUs have various architectural extensions that are there for 'free' and should be taken advantage of. One very effective but often underused extension is support for SIMD - that is, doing several calculations in one instruction.
Often developers ignore these advanced operations because of the perceived effort of adding such instructions to application code. While it is possible to use these instructions by adding macros, inline assembler or dedicated library functions to the application code, a favourite of many developers is to rely on the compiler to automatically insert such instruction in the generated code.
One technique known as 'auto-vectorisation' can lead to a significant performance boost of an application. In this technique the compiler looks for calculations that are performed in a loop. By replacing such calculations with, say, Streaming SIMD Extension (SSE) instructions, the compiler effectively reduces the number of loop iterations required. Some developers have seen their applications run twice as fast by turning on auto-vectorisation in the compiler.
Like the power gains of the previous section, using these architectural extensions may be a valid reason in itself for using a multi-core processor, even if you are not developing threaded code.
Not all programs are good candidates for parallelism. Even if your program seems to need a 'parallel facelift', it does not necessarily follow that going multi-core will help you. For example, say your product is an application running real-time weather pattern simulations, based on data collected from a number of remote sensors.
The measurements of wind speed, direction, temperature and humidity are being used to calculate the weather pattern over the next 30 minutes. Imagine that the application always produces its calculation results too late, and the longer the application runs the worse the timeliness of the simulation is.
One could assume that the poor performance is because the CPU is not powerful enough to do the calculations in time. Going parallel might be the right solution " but how do we prove this? Of course, it could be that the real bottleneck is an IO problem, the reason for the poor application performance being the implementation of the remote data collection and not excessive CPU load.
<>There are a number of profiling tools available that can help form a correct picture of the running program. Such analysers typically rely on runtime architectural events that are generated by the CPU. Before you migrate your application to multi-core, it would be worth analysing the application with such a tool, using the information you glean to help in the decision making process.
There are different ways that one can introduce parallelism into the high-level design of a program. Three common strategies available are functional parallelism, data parallelism and software pipe-lining.In functional parallelism, each task or thread is allocated a distinct job; for example one thread might be reading a temperature transducer, while another thread is carrying out a series of CPU intensive calculations.
In data parallelism, each task or thread carries out the same type of activity. For example, a large matrix multiplication can be shared between, say, four cores, thus reducing the time taken to perform that calculation by a factor of four.
A software pipeline is somewhat akin to a production line, where a series of workers carry out a specific duty before passing the work onto the next worker in the production line. In a multi-core environment, each worker " or pipeline " is assigned to a different core. In traditional parallel programming, much emphasis is laid on the scalability of an application. Good scalability implies that a program running on a dual-core processor would run twice as fast on a quad-core.
In embedded systems, computing scalability is less important because the execution of the end product tends not to be changed; the shelf-life of the end product usually being measured in years rather than months. It may be that when moving to multi-core, the embedded engineer should not be over-sensitive to the scalability of his design, but rather use a combination of data and functional parallelism that delivers the best performance.
Using high-level constructs
Threading is not a new discipline and most operating systems have an API that allows the programmer to create and manage threads. Using the APIs directly in the code is quite tough, so the recommendation is to use a higher level of abstraction. One way of implementing threading is to use various high-level constructs or extensions to the programming language.
OpenMP is a pragma-based language extension for C/C++ and FORTRAN that allows the programmer to very easily introduce parallelism into an existing program. The standard has been adopted by a number of compiler vendors including GNU, Intel, and Microsoft.
A full description of the standard can be found at www.openmp.org With OpenMP it is easy to incrementally add parallelism to a program. Because the programming is pragma based, your code can still be built on compilers that don't support OpenMP " the compiler in this case would just issue a warning that it has found an unsupported pragma.
As stated earlier, functional parallelism is potentially more interesting than data parallelism when developing an embedded application. An alternative to using OpenMP is to use one of the newly emerging language extensions which supply similar functionality. It is expected that eventually such language extensions will be adopted by an appropriate standards committee. An experimental compiler with such extensions can be found at www.whatif.intel.com.
Another approach to traditional programming languages is to use a graphical development environment. There are a number of 'program by drawing' development tools that take care of all the low level threading implementation for the developer.
One example is National Instruments' LabVIEW, which allows the programmer to design his program diagrammatically, by connecting a number of objects together. Support for multi-core is simply adding a loop block to the diagram.
When programs run in parallel, they can be very difficult to debug " especially when using tools that are not enabled for parallelism. Identifying and debugging issues related to using shared resources and shared variables, synchronisation between different threads and dealing with deadlocks and livelocks are notoriously difficult.
However, there is a now a growing number of tools available from different vendors, specifically designed to aid debugging and tuning of parallel applications. The Intel Thread Checker and Intel Thread Profiler are examples of tools that can be can be used to debug and tune parallel programs
Where no parallel debugging tools are available for the embedded target you are working on, it is a legitimate practice to use standard desktop tools, carrying out the first set of tests on a desktop rather than the embedded target. It's a common experience that threading issues appearing on the target can often be first captured by running the application code on a desktop machine.
Stephen Blair-Chappell is a Technical Consulting Engineer at Intel Compiler Labs.
XPort®
Build Serial to Ethernet Connectivity and Control into Your Products, Quickly and Simply
XPort® is a compact, integrated solution to web enable virtually any device with serial capability. By incorporating XPort to a product design, manufacturers can quickly and easily offer serial to Ethernet networking capability as a standard feature — so they can be accessed and controlled over the Internet.
Full Networking in a Tiny Package
XPort embedded device server removes the complexity – of designing network connectivity into a product by incorporating all of the required hardware and software inside a single embedded Ethernet solution. Smaller than your thumb, it includes all essential networking features, including a 10Base- T/100Base-TX Ethernet connection, proven operating system, an embedded web server, e-mail alerts, a full TCP/IP protocol stack, and 256-bit AES encryption for secure communications. This easy-to-embed networking processor module enables engineers to focus on their core competency while reducing development time and cost and increasing product value.
Integrated Network Communications Module
XPort is powered by our DSTni™ network processor SoC, which includes a 10/100 MAC/PHY and 256 KB of SRAM. It features a built-in web server for communications with a device via a standard Internet browser. Web capability can be used for remote configuration, real-time monitoring or troubleshooting. XPort has 512 KB of onmodule Flash for web pages and software upgrades. It acts as a dedicated co-processor that optimizes network activities permitting the host microprocessor to function at maximum efficiency.
Building Intelligent Devices
With XPort you can embed intelligence into any electronic product for applications such as:
Remote diagnostics and upgrades
Asset tracking and replenishment
Automation and control
Power management
Remote collaboration
Personalized content delivery
Robust, Feature-Rich Software Suite
Eliminating the need to negotiate the intricacies of Transmission Control Protocol (TCP) or Internet Protocol (IP), XPort incorporates:
Robust Real Time Operating System (RTOS)
Full-featured network protocol stack
Proven, ready-to-use serial-to-wireless application
Built-in web server for device communication and configuration via a standard browser
The Windows-based DeviceInstaller™ makes configuring one or more XPorts in a subnet quick and easy.
Install and configure XPort and load firmware
Assign IP & other network specific addresses
Set wireless parameters
Load custom web pages and view specific device data
Enable web-based configuration of the device
Ping or query the attached device(s) over the network
Allow Telnet communication with the device(s)
Order Information & Part Numbers
Lantronix products are available from a wide variety of leading technology vendors.
Click here to use our Partner Locator.
To speak with a Lantronix sales representative in North America, call +1 (800) 526-8764. For a full listing of Lantronix worldwide offices, please see our Contact Information page.
Model Part Number Description
XPort XE
(Min. Quantity 50 units) XP1001000-03R XPort RoHS Extended Temperature
RoHS Cert. of Compliance
XP1001001-03R XPort RoHS Commercial Temperature
RoHS Cert. of Compliance
XP1001000M-03R XPort XE RoHS Extended Temperature, with MODBUS
RoHS Cert. of Compliance
XPort SE
(Min. Quantity 50 units) XP1002000-03R XPort RoHS Extended Temperature, with Encryption
RoHS Cert. of Compliance
XP1002001-03R XPort RoHS Commercial Temperature, with Encryption
RoHS Cert. of Compliance
XPort SMPL XP100200S-03R XPort RoHS Extended Temperature, with Encryption
- Sample
RoHS Cert. of Compliance
XPort 485
(Min. Quantity 50 units) XP1004000-03R XPort RS-485 RoHS Extended Temperature, with Encryption
XPort 485 SMPL XP100400S-03R XPort RS-485 RoHS Extended Temperature, with Encryption - Sample
XPort Evaluation Kit XP100200K-03 XPort Evaluation Kit
(More Information)
Features & Specifications
Serial Interface
Interface CMOS (Asynchronous, 5V Tolerant)
Data Rates 300 bps to 921,600 bps
Characters 7 or 8 data bits
Parity odd, even, none
Stop Bits 1 or 2
Control Signals DTR/DCD, CTS, RTS
Flow Control XON/XOFF, RTS/CTS
Programmable I/O 3 PIO pins (software selectable)
Network Interface
Interface Ethernet 10Base-T or 100Base-TX (Auto-Sensing)
Connector RJ45
Protocols TCP/IP, UDP/IP, ARP, ICMP, SNMP, TFTP, Telnet, DHCP, BOOTP, HTTP, and AutoIP
Indicators (LED)
10Base-T connection
100Base-TX connection
Link & activity indicator - Full/half duplex
Management
SNMP, Telnet, serial, internal web server, and Microsoft Windows®-based utility for configuration
Security
Password protection
Optional 256-bit AES Rijndael encryption
Internal Web Server
Storage Capacity 384KB for web pages
Architecture
CPU Based on the DSTni-EX enhanced 16-bit, 48MHz or 88MHz, x86 architecture
Memory 256KB SRAM and 512KB flash
Firmware Upgradeable via TFTP and serially
Power
Input Voltage 3.3 VDC
Environmental
Extended Temp -40° to 85°C (-40° to 185°F)
Commercial Temp 0° to 70°C (32° to 158°F)
Storage -40° to 85°C (-40° to 185°F)
Packaging
Dimensions 33.9 x 16.25 x 13.5mm (1.33 x 0.64 x 0.53 in)
Weight 9.6 g (0.34 oz)
Included Software
Web manager, Windows®-based DeviceInstaller configuration software and Com Port Redirector.
XPort® is a compact, integrated solution to web enable virtually any device with serial capability. By incorporating XPort to a product design, manufacturers can quickly and easily offer serial to Ethernet networking capability as a standard feature — so they can be accessed and controlled over the Internet.
Full Networking in a Tiny Package
XPort embedded device server removes the complexity – of designing network connectivity into a product by incorporating all of the required hardware and software inside a single embedded Ethernet solution. Smaller than your thumb, it includes all essential networking features, including a 10Base- T/100Base-TX Ethernet connection, proven operating system, an embedded web server, e-mail alerts, a full TCP/IP protocol stack, and 256-bit AES encryption for secure communications. This easy-to-embed networking processor module enables engineers to focus on their core competency while reducing development time and cost and increasing product value.
Integrated Network Communications Module
XPort is powered by our DSTni™ network processor SoC, which includes a 10/100 MAC/PHY and 256 KB of SRAM. It features a built-in web server for communications with a device via a standard Internet browser. Web capability can be used for remote configuration, real-time monitoring or troubleshooting. XPort has 512 KB of onmodule Flash for web pages and software upgrades. It acts as a dedicated co-processor that optimizes network activities permitting the host microprocessor to function at maximum efficiency.
Building Intelligent Devices
With XPort you can embed intelligence into any electronic product for applications such as:
Remote diagnostics and upgrades
Asset tracking and replenishment
Automation and control
Power management
Remote collaboration
Personalized content delivery
Robust, Feature-Rich Software Suite
Eliminating the need to negotiate the intricacies of Transmission Control Protocol (TCP) or Internet Protocol (IP), XPort incorporates:
Robust Real Time Operating System (RTOS)
Full-featured network protocol stack
Proven, ready-to-use serial-to-wireless application
Built-in web server for device communication and configuration via a standard browser
The Windows-based DeviceInstaller™ makes configuring one or more XPorts in a subnet quick and easy.
Install and configure XPort and load firmware
Assign IP & other network specific addresses
Set wireless parameters
Load custom web pages and view specific device data
Enable web-based configuration of the device
Ping or query the attached device(s) over the network
Allow Telnet communication with the device(s)
Order Information & Part Numbers
Lantronix products are available from a wide variety of leading technology vendors.
Click here to use our Partner Locator.
To speak with a Lantronix sales representative in North America, call +1 (800) 526-8764. For a full listing of Lantronix worldwide offices, please see our Contact Information page.
Model Part Number Description
XPort XE
(Min. Quantity 50 units) XP1001000-03R XPort RoHS Extended Temperature
RoHS Cert. of Compliance
XP1001001-03R XPort RoHS Commercial Temperature
RoHS Cert. of Compliance
XP1001000M-03R XPort XE RoHS Extended Temperature, with MODBUS
RoHS Cert. of Compliance
XPort SE
(Min. Quantity 50 units) XP1002000-03R XPort RoHS Extended Temperature, with Encryption
RoHS Cert. of Compliance
XP1002001-03R XPort RoHS Commercial Temperature, with Encryption
RoHS Cert. of Compliance
XPort SMPL XP100200S-03R XPort RoHS Extended Temperature, with Encryption
- Sample
RoHS Cert. of Compliance
XPort 485
(Min. Quantity 50 units) XP1004000-03R XPort RS-485 RoHS Extended Temperature, with Encryption
XPort 485 SMPL XP100400S-03R XPort RS-485 RoHS Extended Temperature, with Encryption - Sample
XPort Evaluation Kit XP100200K-03 XPort Evaluation Kit
(More Information)
Features & Specifications
Serial Interface
Interface CMOS (Asynchronous, 5V Tolerant)
Data Rates 300 bps to 921,600 bps
Characters 7 or 8 data bits
Parity odd, even, none
Stop Bits 1 or 2
Control Signals DTR/DCD, CTS, RTS
Flow Control XON/XOFF, RTS/CTS
Programmable I/O 3 PIO pins (software selectable)
Network Interface
Interface Ethernet 10Base-T or 100Base-TX (Auto-Sensing)
Connector RJ45
Protocols TCP/IP, UDP/IP, ARP, ICMP, SNMP, TFTP, Telnet, DHCP, BOOTP, HTTP, and AutoIP
Indicators (LED)
10Base-T connection
100Base-TX connection
Link & activity indicator - Full/half duplex
Management
SNMP, Telnet, serial, internal web server, and Microsoft Windows®-based utility for configuration
Security
Password protection
Optional 256-bit AES Rijndael encryption
Internal Web Server
Storage Capacity 384KB for web pages
Architecture
CPU Based on the DSTni-EX enhanced 16-bit, 48MHz or 88MHz, x86 architecture
Memory 256KB SRAM and 512KB flash
Firmware Upgradeable via TFTP and serially
Power
Input Voltage 3.3 VDC
Environmental
Extended Temp -40° to 85°C (-40° to 185°F)
Commercial Temp 0° to 70°C (32° to 158°F)
Storage -40° to 85°C (-40° to 185°F)
Packaging
Dimensions 33.9 x 16.25 x 13.5mm (1.33 x 0.64 x 0.53 in)
Weight 9.6 g (0.34 oz)
Included Software
Web manager, Windows®-based DeviceInstaller configuration software and Com Port Redirector.
WiPort®
Build Embedded 802.11 b/g Wireless Networking Into Your Products!
A compact, integrated hardware and software module, WiPort® enables you to build wireless networking into virtually any electronic device with serial or Ethernet capability. With WiPort your products can be wirelessly accessed and controlled over a network or the Internet!
The matchbook-sized WiPort takes the complexity out of RF design and embedded Ethernet networking and WiPort enables engineers to focus on their core competency of designing products. It minimizes engineering risk, reduces cost and shortens development time. Just apply power and UART output, and the product is wireless and network-ready!
Complete Wireless Network Processing Module
Powered by a Lantronix DSTni™ Ethernet processor SoC that includes a 10Base-T/100Base-TX MAC/PHY and 256 KB of on-chip SRAM, WiPort also includes a complete 802.11 b/g radio and 2MB of Flash memory for web page storage and system upgrades. WiPort is a dedicated co-processor module that optimizes network activity, permitting the device’s host microprocessor to function at maximum efficiency. WiPort connects through its coaxial “pigtail” to an external panel-mounted antenna for rapid electromechanical integration. WiPort works with serial or Ethernet interface devices. SPI, I2C, USB or CAN connectivity can be enabled as a future option.
Bulletproof Security
With IEEE 802.11i-PSK or WPA (PSK, TKIP) encryption WiPort offers heightened security. WiPort also supports 256-bit Advanced Encryption Standards (Rjindael) encryption for true end-to-end (wired to wireless to wired) secure data transfer.
Robust, Feature-Rich Software Suite
Eliminating the need to negotiate the intricacies of Transmission Control Protocol (TCP) or Internet Protocol (IP), WiPort incorporates:
Robust Real Time Operating System (RTOS)
Full-featured network protocol stack
Proven, ready-to-use serial-to-wireless application
Built-in web server for device communication and configuration via a standard browser
The Windows-based DeviceInstaller™ makes configuring one or more WiPorts in a subnet quick and easy.
Install and configure WiPort and load firmware
Assign IP & other network specific addresses
Set wireless parameters
Load custom web pages and view specific device data
Enable web-based configuration of the device
Ping or query the attached device(s) over the network
Allow Telnet communication with the device(s)
FCC Certified for Immediate Deployment
WiPort is certified by the U.S. Federal Communications Commission (FCC). This allows you to leverage the Lantronix WiPort FCC license grant to your label and bypass 802.11 regulatory testing. This accelerates time-to-market and reduces development and testing costs. WiPort is also pre-tested for European telecommunications regulations.
Ethernet-to-Wireless Bridging
With a separate Ethernet port, WiPort offers the unique ability to transparently bridge existing Ethernet-ready devices to a wireless network.
Scan, Gather and Report Radio Parameters
With its Scan command, WiPort enables the ability to report MAC address, RSSI and SSID which are extremely useful during site survey work. The Network Status command additionally reports channel, infra/adhoc, security type, authentication, negotiated encryption types for the current association.
A compact, integrated hardware and software module, WiPort® enables you to build wireless networking into virtually any electronic device with serial or Ethernet capability. With WiPort your products can be wirelessly accessed and controlled over a network or the Internet!
The matchbook-sized WiPort takes the complexity out of RF design and embedded Ethernet networking and WiPort enables engineers to focus on their core competency of designing products. It minimizes engineering risk, reduces cost and shortens development time. Just apply power and UART output, and the product is wireless and network-ready!
Complete Wireless Network Processing Module
Powered by a Lantronix DSTni™ Ethernet processor SoC that includes a 10Base-T/100Base-TX MAC/PHY and 256 KB of on-chip SRAM, WiPort also includes a complete 802.11 b/g radio and 2MB of Flash memory for web page storage and system upgrades. WiPort is a dedicated co-processor module that optimizes network activity, permitting the device’s host microprocessor to function at maximum efficiency. WiPort connects through its coaxial “pigtail” to an external panel-mounted antenna for rapid electromechanical integration. WiPort works with serial or Ethernet interface devices. SPI, I2C, USB or CAN connectivity can be enabled as a future option.
Bulletproof Security
With IEEE 802.11i-PSK or WPA (PSK, TKIP) encryption WiPort offers heightened security. WiPort also supports 256-bit Advanced Encryption Standards (Rjindael) encryption for true end-to-end (wired to wireless to wired) secure data transfer.
Robust, Feature-Rich Software Suite
Eliminating the need to negotiate the intricacies of Transmission Control Protocol (TCP) or Internet Protocol (IP), WiPort incorporates:
Robust Real Time Operating System (RTOS)
Full-featured network protocol stack
Proven, ready-to-use serial-to-wireless application
Built-in web server for device communication and configuration via a standard browser
The Windows-based DeviceInstaller™ makes configuring one or more WiPorts in a subnet quick and easy.
Install and configure WiPort and load firmware
Assign IP & other network specific addresses
Set wireless parameters
Load custom web pages and view specific device data
Enable web-based configuration of the device
Ping or query the attached device(s) over the network
Allow Telnet communication with the device(s)
FCC Certified for Immediate Deployment
WiPort is certified by the U.S. Federal Communications Commission (FCC). This allows you to leverage the Lantronix WiPort FCC license grant to your label and bypass 802.11 regulatory testing. This accelerates time-to-market and reduces development and testing costs. WiPort is also pre-tested for European telecommunications regulations.
Ethernet-to-Wireless Bridging
With a separate Ethernet port, WiPort offers the unique ability to transparently bridge existing Ethernet-ready devices to a wireless network.
Scan, Gather and Report Radio Parameters
With its Scan command, WiPort enables the ability to report MAC address, RSSI and SSID which are extremely useful during site survey work. The Network Status command additionally reports channel, infra/adhoc, security type, authentication, negotiated encryption types for the current association.
IP and Ethernet Interfaces
There is a general consensus that in years to come more and more Internet devices will be embedded and not PC oriented. Just one such prediction is that by 2010, 95% of Internet-connected devices will not be computers. So if they are not computers, what will they be? Embedded Internet devices.
One popular solution is to use an 8 bit microcontroller such as a Rabbit 2000, AVR or PIC and a Ethernet MAC such as a CS8900A or RTL8029AS hanging of it’s parallel port pins in 8 bit mode. A TCP/IP stack is normally written in C and can be striped of features and ported to these resource limited microcontrollers. While this works and we detail many such boards below, a little debate is brewing over it’s reliability and functionality.
With DOS (denial of service) attacks becoming more and more common, it doesn’t take much to knock your little 8 bit microcontroller off the network. In fact some configurations have a little trouble keeping up with the high volume of broadcast packets floating around a loaded network, let alone any malicious attacks.
One solution of course is to put in a bigger processor. This is the case with Embedded Linux devices such as Coldfire, DragonBall or ARM based devices. They are quite powerful enough to allow a suitable bandwidth and not be susceptible to someone’s malicious intent.
The other solution is to use a hardware TCP/IP stack. A hardware based stack is not new. If you have followed this site, you will be aware of the Sekio S-7600A hardware stack which incorporated a TCP/IP stack with a PPP controller so you could connect it to a modem. Sekio had licensed the technology from Iready Corporation. While it had it’s place in data logging or dial on demand applications where your device could dial up the Internet and send you an email to the effect that your house has been broken into or the past 24 hours logged data etc, it wouldn't connect to the popular ethernet networks present everywhere today.
The next logical progression had to be the Ethernet interface. Sekio has exited the embedded Internet business discontinuing it’s S-7600 on the 1st September. However the concept is still alive.
A hardware TCP/IP stack has a couple of advantages. Firstly as they are hardware based, most run at close to line speeds encapsulating and striping streams of data on the fly. This makes it increasingly more difficult to cause a DOS attack and almost impossible to run malicious code using principals of buffer overruns etc. However being hardware makes it difficult to upgrade should little quirks be found allowing say SYN attacks for example.
Later we detail some devices from Ipsil and Connect One. Both have the ability to upload new firmware which future proofs the designs in these peripheral devices. However the Ipsil and Connect One devices on the market today rely on an external ethernet MAC such as the popular CS8900A or RTL8029AS. This contributes to the chip count.
Ipsil has preliminary data on their IPĀµ8932 which combines a webserver, Ethernet MAC layer, and TCP/IP controller all on a single chip. This allows the one chip with 20 digital or analog inputs to display webpages without the need of a microcontroller. Ipsil has WebHoles™ technology which allows holes (simular to server side includes principles) to be filled in with values from the I/O ports. If you do happen to need more complexity, you can add a microcontroller and talk via standard TCP/IP socket calls.
However WIZnet Inc already has a simular device on the Market. The W3150A incorporates a TCP/IP stack and a future proofed 10/100 Ethernet MAC. So when it comes to chip count, it makes sense to off load the burden of the TCP/IP stack into a second peripheral chip complete with Ethernet MAC. It can reduce time to market, as the design of the TCP/IP stack is omitted (or saves costs of licensing one), plus you have a more stable product. Your 8 bit micro effectively has more grunt now, as it's no longer responsible for the lower TCP/IP protocols and ethernet encapsulation. All these advantages and yet, still only two chips.
How long before the leading microcontroller manufacturers are going to integrate a hardware TCP/IP stack and ethernet MAC into their microcontrollers making a one chip solution?
Answer : WIZnet Inc. and Atmel Corporation to jointly develop and market Internet connectivity solutions.
WIZnet Inc. and Atmel Corporation has forged a strategic partnership to develop and co-market Internet connectivity solutions. As part of this agreement WIZnet will manufacture OEM products around Atmel’s AVR microcontrollers. Both have agreed to move in the direction of system-on-chip (SoC) which will see WIZnet’s TCP/IP hardwired technology be integrated with Atmel’s MCU cores. Outcome? An AVR with \ hardware TCP/IP stack and ethernet in the one chip. I can't wait. . . .
One popular solution is to use an 8 bit microcontroller such as a Rabbit 2000, AVR or PIC and a Ethernet MAC such as a CS8900A or RTL8029AS hanging of it’s parallel port pins in 8 bit mode. A TCP/IP stack is normally written in C and can be striped of features and ported to these resource limited microcontrollers. While this works and we detail many such boards below, a little debate is brewing over it’s reliability and functionality.
With DOS (denial of service) attacks becoming more and more common, it doesn’t take much to knock your little 8 bit microcontroller off the network. In fact some configurations have a little trouble keeping up with the high volume of broadcast packets floating around a loaded network, let alone any malicious attacks.
One solution of course is to put in a bigger processor. This is the case with Embedded Linux devices such as Coldfire, DragonBall or ARM based devices. They are quite powerful enough to allow a suitable bandwidth and not be susceptible to someone’s malicious intent.
The other solution is to use a hardware TCP/IP stack. A hardware based stack is not new. If you have followed this site, you will be aware of the Sekio S-7600A hardware stack which incorporated a TCP/IP stack with a PPP controller so you could connect it to a modem. Sekio had licensed the technology from Iready Corporation. While it had it’s place in data logging or dial on demand applications where your device could dial up the Internet and send you an email to the effect that your house has been broken into or the past 24 hours logged data etc, it wouldn't connect to the popular ethernet networks present everywhere today.
The next logical progression had to be the Ethernet interface. Sekio has exited the embedded Internet business discontinuing it’s S-7600 on the 1st September. However the concept is still alive.
A hardware TCP/IP stack has a couple of advantages. Firstly as they are hardware based, most run at close to line speeds encapsulating and striping streams of data on the fly. This makes it increasingly more difficult to cause a DOS attack and almost impossible to run malicious code using principals of buffer overruns etc. However being hardware makes it difficult to upgrade should little quirks be found allowing say SYN attacks for example.
Later we detail some devices from Ipsil and Connect One. Both have the ability to upload new firmware which future proofs the designs in these peripheral devices. However the Ipsil and Connect One devices on the market today rely on an external ethernet MAC such as the popular CS8900A or RTL8029AS. This contributes to the chip count.
Ipsil has preliminary data on their IPĀµ8932 which combines a webserver, Ethernet MAC layer, and TCP/IP controller all on a single chip. This allows the one chip with 20 digital or analog inputs to display webpages without the need of a microcontroller. Ipsil has WebHoles™ technology which allows holes (simular to server side includes principles) to be filled in with values from the I/O ports. If you do happen to need more complexity, you can add a microcontroller and talk via standard TCP/IP socket calls.
However WIZnet Inc already has a simular device on the Market. The W3150A incorporates a TCP/IP stack and a future proofed 10/100 Ethernet MAC. So when it comes to chip count, it makes sense to off load the burden of the TCP/IP stack into a second peripheral chip complete with Ethernet MAC. It can reduce time to market, as the design of the TCP/IP stack is omitted (or saves costs of licensing one), plus you have a more stable product. Your 8 bit micro effectively has more grunt now, as it's no longer responsible for the lower TCP/IP protocols and ethernet encapsulation. All these advantages and yet, still only two chips.
How long before the leading microcontroller manufacturers are going to integrate a hardware TCP/IP stack and ethernet MAC into their microcontrollers making a one chip solution?
Answer : WIZnet Inc. and Atmel Corporation to jointly develop and market Internet connectivity solutions.
WIZnet Inc. and Atmel Corporation has forged a strategic partnership to develop and co-market Internet connectivity solutions. As part of this agreement WIZnet will manufacture OEM products around Atmel’s AVR microcontrollers. Both have agreed to move in the direction of system-on-chip (SoC) which will see WIZnet’s TCP/IP hardwired technology be integrated with Atmel’s MCU cores. Outcome? An AVR with \ hardware TCP/IP stack and ethernet in the one chip. I can't wait. . . .
Subscribe to:
Posts (Atom)