Quantcast
Channel: RapidMiner Studio Forum topics
Viewing all 2122 articles
Browse latest View live

change attributes order

$
0
0

Hi there 

 

I have a data set with patient_ID and some other attributes as specific diseases that each column includes 1/0

ID         Diabetes        Hypertension     Depression 

102            1                        0                       1

 

So how can I generate a new attribute that includes all current diseases for each  patient_ID  such as trhis:

ID         Disease   

102      Diabetes, Depression 

 

any thought?

Thanks,

Abbas


Using decision tree operator to predict patterns of multiple data sets

$
0
0

Dear Sirs/Madams,

 

I have thousands of datasets, and each data set has occupancy pattern data of an individual household in 5-min interval during two consecutive years. I would like to use the decision tree operator to predict the occupancy patterns using my data sets. However, it seems that the decision tree operator can only be connected with one dataset. May I know how to connect the operator with the multiple data sets please?

 

Thank you very much.

Best regards,

Lisa

Wrong Credetials

$
0
0

Hi! I am having trouble with opening RapidMiner,  it keeps on showing a message "please enter a valid username and password to access the SOCKS proxy 127.0.0.1:54761" please help!

Thank You!

Two classification models in one process?

$
0
0

Hello RM Community,

 

I've been lurking around the forums for quite some time now and I am still learning this new tool. I just wanted to ask if it is possible to run two classification models (e.g. k-nn and decision tree) in one process? If yes, then is it possible to get separate performance results for each? 

 

Sorry if it's a newbie question. 

 

Thanks in advance!

Rem

LDA Optimization

$
0
0

Hi guys, Any idea how best to tweak the parameters in optimizing the LDA model?   I'm playing about with this example using RSS news feeds and not 100% sure if the Optimize model is working well enough on small values for topics.  Is it not enough of a large dataset do you think? 

 

Please note: increasing the start & end values for number of topics gives better results, I'm wanting to build a generic example on usage and any suggested pointers. 

 

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><context><input/><output/><macros/></context><operator activated="true" class="process" compatibility="8.1.001" expanded="true" name="Process"><process expanded="true"><operator activated="true" class="subprocess" compatibility="8.1.001" expanded="true" height="103" name="Get News Feeds" width="90" x="45" y="34"><process expanded="true"><operator activated="true" class="web:read_rss" compatibility="7.3.000" expanded="true" height="68" name="BBC Top Stories" width="90" x="45" y="34"><parameter key="url" value="http://feeds.bbci.co.uk/news/rss.xml"/></operator><operator activated="true" class="web:read_rss" compatibility="7.3.000" expanded="true" height="68" name="BBC Asia" width="90" x="45" y="85"><parameter key="url" value="http://feeds.bbci.co.uk/news/world/asia/rss.xml"/></operator><operator activated="true" class="web:read_rss" compatibility="7.3.000" expanded="true" height="68" name="BBC Business" width="90" x="45" y="136"><parameter key="url" value="http://feeds.bbci.co.uk/news/business/rss.xml"/></operator><operator activated="true" class="web:read_rss" compatibility="7.3.000" expanded="true" height="68" name="BBC Entertainment" width="90" x="45" y="187"><parameter key="url" value="http://feeds.bbci.co.uk/news/entertainment_and_arts/rss.xml"/></operator><operator activated="true" class="append" compatibility="8.1.001" expanded="true" height="145" name="Append" width="90" x="179" y="34"/><operator activated="true" class="generate_copy" compatibility="8.1.001" expanded="true" height="82" name="Generate Copy" width="90" x="313" y="34"><parameter key="attribute_name" value="Title"/><parameter key="new_name" value="Title2"/></operator><operator activated="true" class="text_to_nominal" compatibility="8.1.001" expanded="true" height="82" name="Text to Nominal" width="90" x="447" y="34"><parameter key="attribute_filter_type" value="subset"/><parameter key="attributes" value="Link|Title2"/><description align="center" color="transparent" colored="false" width="126">Don't convert article link to document text.</description></operator><operator activated="true" class="split_data" compatibility="8.1.001" expanded="true" height="103" name="Split Data" width="90" x="581" y="34"><enumeration key="partitions"><parameter key="ratio" value="0.7"/><parameter key="ratio" value="0.3"/></enumeration><parameter key="sampling_type" value="shuffled sampling"/><description align="center" color="transparent" colored="false" width="126">Randomly sort the data from the feeds. Split into training &amp;amp; testing.</description></operator><connect from_op="BBC Top Stories" from_port="output" to_op="Append" to_port="example set 1"/><connect from_op="BBC Asia" from_port="output" to_op="Append" to_port="example set 2"/><connect from_op="BBC Business" from_port="output" to_op="Append" to_port="example set 3"/><connect from_op="BBC Entertainment" from_port="output" to_op="Append" to_port="example set 4"/><connect from_op="Append" from_port="merged set" to_op="Generate Copy" to_port="example set input"/><connect from_op="Generate Copy" from_port="example set output" to_op="Text to Nominal" to_port="example set input"/><connect from_op="Text to Nominal" from_port="example set output" to_op="Split Data" to_port="example set"/><connect from_op="Split Data" from_port="partition 1" to_port="out 1"/><connect from_op="Split Data" from_port="partition 2" to_port="out 2"/><portSpacing port="source_in 1" spacing="0"/><portSpacing port="sink_out 1" spacing="0"/><portSpacing port="sink_out 2" spacing="0"/><portSpacing port="sink_out 3" spacing="0"/></process></operator><operator activated="true" class="text:data_to_documents" compatibility="8.1.000" expanded="true" height="68" name="Data to Documents (2)" width="90" x="179" y="289"><list key="specify_weights"/></operator><operator activated="true" class="text:data_to_documents" compatibility="8.1.000" expanded="true" height="68" name="Data to Documents" width="90" x="179" y="34"><list key="specify_weights"/></operator><operator activated="true" class="loop_collection" compatibility="8.1.001" expanded="true" height="82" name="Loop Collection" width="90" x="313" y="34"><process expanded="true"><operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="45" y="34"/><operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases" width="90" x="179" y="34"/><operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="313" y="34"><parameter key="min_chars" value="2"/></operator><operator activated="true" class="text:filter_stopwords_english" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (English)" width="90" x="447" y="34"/><operator activated="true" class="text:generate_n_grams_terms" compatibility="8.1.000" expanded="true" height="68" name="Generate n-Grams (Terms)" width="90" x="581" y="34"/><connect from_port="single" to_op="Tokenize" to_port="document"/><connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/><connect from_op="Transform Cases" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/><connect from_op="Filter Tokens (by Length)" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/><connect from_op="Filter Stopwords (English)" from_port="document" to_op="Generate n-Grams (Terms)" to_port="document"/><connect from_op="Generate n-Grams (Terms)" from_port="document" to_port="output 1"/><portSpacing port="source_single" spacing="0"/><portSpacing port="sink_output 1" spacing="0"/><portSpacing port="sink_output 2" spacing="0"/></process><description align="center" color="transparent" colored="false" width="126">Text Prep using Text Mining</description></operator><operator activated="true" class="loop_collection" compatibility="8.1.001" expanded="true" height="82" name="Loop Collection (3)" width="90" x="313" y="238"><process expanded="true"><operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize (3)" width="90" x="45" y="34"/><operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases (2)" width="90" x="179" y="34"/><operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (3)" width="90" x="313" y="34"><parameter key="min_chars" value="2"/></operator><operator activated="true" class="text:filter_stopwords_english" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (3)" width="90" x="447" y="34"/><operator activated="true" class="text:generate_n_grams_terms" compatibility="8.1.000" expanded="true" height="68" name="Generate n-Grams (2)" width="90" x="581" y="34"/><connect from_port="single" to_op="Tokenize (3)" to_port="document"/><connect from_op="Tokenize (3)" from_port="document" to_op="Transform Cases (2)" to_port="document"/><connect from_op="Transform Cases (2)" from_port="document" to_op="Filter Tokens (3)" to_port="document"/><connect from_op="Filter Tokens (3)" from_port="document" to_op="Filter Stopwords (3)" to_port="document"/><connect from_op="Filter Stopwords (3)" from_port="document" to_op="Generate n-Grams (2)" to_port="document"/><connect from_op="Generate n-Grams (2)" from_port="document" to_port="output 1"/><portSpacing port="source_single" spacing="0"/><portSpacing port="sink_output 1" spacing="0"/><portSpacing port="sink_output 2" spacing="0"/></process><description align="center" color="transparent" colored="false" width="126">Text Prep using Text Mining</description></operator><operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="8.1.001" expanded="true" height="187" name="Optimize Parameters (Grid)" width="90" x="514" y="34"><list key="parameters"><parameter key="LDA.number_of_topics" value="[5;20;5;linear]"/></list><process expanded="true"><operator activated="true" class="operator_toolbox:lda" compatibility="1.0.000" expanded="true" height="124" name="LDA" width="90" x="179" y="34"><parameter key="number_of_topics" value="20"/><parameter key="iterations" value="100"/><parameter key="use_local_random_seed" value="true"/><parameter key="local_random_seed" value="1997"/></operator><operator activated="true" class="generate_direct_mailing_data" compatibility="8.1.001" expanded="true" height="68" name="Generate Direct Mailing Data" width="90" x="112" y="238"/><operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.1.001" expanded="true" height="103" name="Decision Tree" width="90" x="246" y="238"><description align="center" color="transparent" colored="false" width="126">This is because the optimize operator doesn't recognize the LDA model type.</description></operator><connect from_port="input 1" to_op="LDA" to_port="col"/><connect from_op="LDA" from_port="exa" to_port="output 1"/><connect from_op="LDA" from_port="top" to_port="output 2"/><connect from_op="LDA" from_port="mod" to_port="output 3"/><connect from_op="LDA" from_port="per" to_port="performance"/><connect from_op="Generate Direct Mailing Data" from_port="output" to_op="Decision Tree" to_port="training set"/><connect from_op="Decision Tree" from_port="model" to_port="model"/><portSpacing port="source_input 1" spacing="0"/><portSpacing port="source_input 2" spacing="0"/><portSpacing port="sink_performance" spacing="0"/><portSpacing port="sink_model" spacing="0"/><portSpacing port="sink_output 1" spacing="0"/><portSpacing port="sink_output 2" spacing="0"/><portSpacing port="sink_output 3" spacing="0"/><portSpacing port="sink_output 4" spacing="0"/></process></operator><operator activated="true" class="operator_toolbox:apply_model_documents" compatibility="1.0.000" expanded="true" height="103" name="Apply Model (Documents)" width="90" x="581" y="238"/><connect from_op="Get News Feeds" from_port="out 1" to_op="Data to Documents" to_port="example set"/><connect from_op="Get News Feeds" from_port="out 2" to_op="Data to Documents (2)" to_port="example set"/><connect from_op="Data to Documents (2)" from_port="documents" to_op="Loop Collection (3)" to_port="collection"/><connect from_op="Data to Documents" from_port="documents" to_op="Loop Collection" to_port="collection"/><connect from_op="Loop Collection" from_port="output 1" to_op="Optimize Parameters (Grid)" to_port="input 1"/><connect from_op="Loop Collection (3)" from_port="output 1" to_op="Apply Model (Documents)" to_port="doc"/><connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 2"/><connect from_op="Optimize Parameters (Grid)" from_port="output 1" to_port="result 3"/><connect from_op="Optimize Parameters (Grid)" from_port="output 2" to_port="result 4"/><connect from_op="Optimize Parameters (Grid)" from_port="output 3" to_op="Apply Model (Documents)" to_port="mod"/><connect from_op="Apply Model (Documents)" from_port="exa" to_port="result 1"/><portSpacing port="source_input 1" spacing="0"/><portSpacing port="sink_result 1" spacing="84"/><portSpacing port="sink_result 2" spacing="0"/><portSpacing port="sink_result 3" spacing="0"/><portSpacing port="sink_result 4" spacing="0"/><portSpacing port="sink_result 5" spacing="0"/></process></operator></process>

SPADE algorithm for sequential pattern mining, does RapidMiner has ?

$
0
0

Hi RapidMiner,

 

SPADE (Sequential Pattern Discovery using Equivalence classes) is another algorithm for sequential pattern mining besides GSP, and CM-SPADE is an improved version of SPADE by using co-occurence information. May i know whether RapidMiner has either of SPADE or CM-SPADE ? If not, will you plan to implement these 2, cos i've seen some articles said SPADE is faster than GSP for large datasets?

 

Thank you very much for this information.

 

P/S: below is the links to the papers of SPADE and CM-SPADE:

 

https://link.springer.com/content/pdf/10.1023/A:1007652502315.pdf

 

https://www.philippe-fournier-viger.com/spmf/PAKDD2014_sequential_pattern_mining_CM-SPADE_CM-SPAM.pdf

 

Best Regards,

Phi Vu

how to change credentials password

$
0
0

Hello! i would like to know how to change my credentials password, i want to access market place but i cant login because it says either my username or password is wrong, so how can i update my credentials?

Thank you

Problem with Neural Net "use local random seed"

$
0
0

Hi guys, I have a small trouble.

 

How is it possible that when the "use local random seed" of Neural Net block is not enabled (unchecked) the same NN process does NOT provide the same results?

 

What is the purpose of "use local random seed"?

 

This is my XML process;

 

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="read_csv" compatibility="8.1.001" expanded="true" height="68" name="Read CSV" width="90" x="112" y="187">
<parameter key="csv_file" value="C:\Users\Admin\Desktop\data Example.csv"/>
<parameter key="column_separators" value=";"/>
<parameter key="trim_lines" value="false"/>
<parameter key="use_quotes" value="true"/>
<parameter key="quotes_character" value="&quot;"/>
<parameter key="escape_character" value="\"/>
<parameter key="skip_comments" value="false"/>
<parameter key="comment_characters" value="#"/>
<parameter key="parse_numbers" value="true"/>
<parameter key="decimal_character" value="."/>
<parameter key="grouped_digits" value="false"/>
<parameter key="grouping_character" value=","/>
<parameter key="date_format" value=""/>
<parameter key="first_row_as_names" value="true"/>
<list key="annotations"/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="read_all_values_as_polynominal" value="false"/>
<list key="data_set_meta_data_information"/>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes T" width="90" x="246" y="187">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value="F|D|C|B|A"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="set_role" compatibility="8.1.001" expanded="true" height="82" name="Set Role T" width="90" x="380" y="187">
<parameter key="attribute_name" value="F"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="multiply" compatibility="8.1.001" expanded="true" height="145" name="Multiply Data T" width="90" x="514" y="187"/>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="neural_net" compatibility="8.1.001" expanded="true" height="82" name="Neural Net" width="90" x="715" y="136">
<list key="hidden_layers">
<parameter key="Hidden" value="21"/>
</list>
<parameter key="training_cycles" value="32000"/>
<parameter key="learning_rate" value="0.1"/>
<parameter key="momentum" value="0.1"/>
<parameter key="decay" value="false"/>
<parameter key="shuffle" value="true"/>
<parameter key="normalize" value="true"/>
<parameter key="error_epsilon" value="1.0E-5"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="apply_model" compatibility="8.1.001" expanded="true" height="82" name="Apply Model NN" width="90" x="849" y="136">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="performance_regression" compatibility="8.1.001" expanded="true" height="82" name="Performance NN" width="90" x="983" y="136">
<parameter key="main_criterion" value="first"/>
<parameter key="root_mean_squared_error" value="true"/>
<parameter key="absolute_error" value="false"/>
<parameter key="relative_error" value="true"/>
<parameter key="relative_error_lenient" value="false"/>
<parameter key="relative_error_strict" value="false"/>
<parameter key="normalized_absolute_error" value="false"/>
<parameter key="root_relative_squared_error" value="false"/>
<parameter key="squared_error" value="false"/>
<parameter key="correlation" value="true"/>
<parameter key="squared_correlation" value="true"/>
<parameter key="prediction_average" value="false"/>
<parameter key="spearman_rho" value="false"/>
<parameter key="kendall_tau" value="false"/>
<parameter key="skip_undefined_labels" value="true"/>
<parameter key="use_example_weights" value="true"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="neural_net" compatibility="8.1.001" expanded="true" height="82" name="Neural Net (2)" width="90" x="715" y="238">
<list key="hidden_layers">
<parameter key="Hidden" value="21"/>
</list>
<parameter key="training_cycles" value="32000"/>
<parameter key="learning_rate" value="0.1"/>
<parameter key="momentum" value="0.1"/>
<parameter key="decay" value="false"/>
<parameter key="shuffle" value="true"/>
<parameter key="normalize" value="true"/>
<parameter key="error_epsilon" value="1.0E-5"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="apply_model" compatibility="8.1.001" expanded="true" height="82" name="Apply Model NN (2)" width="90" x="849" y="238">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="performance_regression" compatibility="8.1.001" expanded="true" height="82" name="Performance NN (2)" width="90" x="983" y="238">
<parameter key="main_criterion" value="first"/>
<parameter key="root_mean_squared_error" value="true"/>
<parameter key="absolute_error" value="false"/>
<parameter key="relative_error" value="true"/>
<parameter key="relative_error_lenient" value="false"/>
<parameter key="relative_error_strict" value="false"/>
<parameter key="normalized_absolute_error" value="false"/>
<parameter key="root_relative_squared_error" value="false"/>
<parameter key="squared_error" value="false"/>
<parameter key="correlation" value="true"/>
<parameter key="squared_correlation" value="true"/>
<parameter key="prediction_average" value="false"/>
<parameter key="spearman_rho" value="false"/>
<parameter key="kendall_tau" value="false"/>
<parameter key="skip_undefined_labels" value="true"/>
<parameter key="use_example_weights" value="true"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="neural_net" compatibility="8.1.001" expanded="true" height="82" name="Neural Net (3)" width="90" x="715" y="340">
<list key="hidden_layers">
<parameter key="Hidden" value="21"/>
</list>
<parameter key="training_cycles" value="32000"/>
<parameter key="learning_rate" value="0.1"/>
<parameter key="momentum" value="0.1"/>
<parameter key="decay" value="false"/>
<parameter key="shuffle" value="true"/>
<parameter key="normalize" value="true"/>
<parameter key="error_epsilon" value="1.0E-5"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="apply_model" compatibility="8.1.001" expanded="true" height="82" name="Apply Model NN (3)" width="90" x="849" y="340">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="performance_regression" compatibility="8.1.001" expanded="true" height="82" name="Performance NN (3)" width="90" x="983" y="340">
<parameter key="main_criterion" value="first"/>
<parameter key="root_mean_squared_error" value="true"/>
<parameter key="absolute_error" value="false"/>
<parameter key="relative_error" value="true"/>
<parameter key="relative_error_lenient" value="false"/>
<parameter key="relative_error_strict" value="false"/>
<parameter key="normalized_absolute_error" value="false"/>
<parameter key="root_relative_squared_error" value="false"/>
<parameter key="squared_error" value="false"/>
<parameter key="correlation" value="true"/>
<parameter key="squared_correlation" value="true"/>
<parameter key="prediction_average" value="false"/>
<parameter key="spearman_rho" value="false"/>
<parameter key="kendall_tau" value="false"/>
<parameter key="skip_undefined_labels" value="true"/>
<parameter key="use_example_weights" value="true"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="neural_net" compatibility="8.1.001" expanded="true" height="82" name="Neural Net 21" width="90" x="715" y="442">
<list key="hidden_layers">
<parameter key="Hidden" value="21"/>
</list>
<parameter key="training_cycles" value="32000"/>
<parameter key="learning_rate" value="0.1"/>
<parameter key="momentum" value="0.1"/>
<parameter key="decay" value="false"/>
<parameter key="shuffle" value="true"/>
<parameter key="normalize" value="true"/>
<parameter key="error_epsilon" value="1.0E-5"/>
<parameter key="use_local_random_seed" value="true"/>
<parameter key="local_random_seed" value="1992"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="apply_model" compatibility="8.1.001" expanded="true" height="82" name="Apply Model NN (4)" width="90" x="849" y="442">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="performance_regression" compatibility="8.1.001" expanded="true" height="82" name="Performance NN (4)" width="90" x="983" y="442">
<parameter key="main_criterion" value="first"/>
<parameter key="root_mean_squared_error" value="true"/>
<parameter key="absolute_error" value="false"/>
<parameter key="relative_error" value="true"/>
<parameter key="relative_error_lenient" value="false"/>
<parameter key="relative_error_strict" value="false"/>
<parameter key="normalized_absolute_error" value="false"/>
<parameter key="root_relative_squared_error" value="false"/>
<parameter key="squared_error" value="false"/>
<parameter key="correlation" value="true"/>
<parameter key="squared_correlation" value="true"/>
<parameter key="prediction_average" value="false"/>
<parameter key="spearman_rho" value="false"/>
<parameter key="kendall_tau" value="false"/>
<parameter key="skip_undefined_labels" value="true"/>
<parameter key="use_example_weights" value="true"/>
</operator>
</process>

 

 Can anyone help me?
Thanks!!!!


Best practices for text mining an academic text

$
0
0

I have long, complex texts which I want to classify to categories such as psychology, history etc.

What processes would you recommend to use? Eg. tokenization, n-grams etc.

Thank you

How to change an information of an attribute?

$
0
0

Hi Experts, 

 

Hi Experts,

I'm starting to work with rapidminer and I need your help.
I've an database with 23 attributes and I need to change the information of attribute "outcome". This attribute "outcome" has the informations "Lived", "Died" and "Euthanized", in others words, after I read the csv file I'll need treat the database changing the informations of attribute "outcome".
I would like to change all information equal "Euthanized" to "Died". What operetors can I use?

 

Thanks in advance my friends.

 

Marcelo Batista

 

Cassandra driver exception for timestamp

$
0
0

Hi 

I am trying to connect rapidminer to cassandra 3.1 and I believe I have found a bug. I am using manual queries to test the connection. Most of the queries work OK but reading any timestamp throws an exception.

Cassandra driver Codec not found for requested operation: [timestamp <-> com.datastax.driver.core.LocalDate]

(full stack trace and example table below)

This problem appears to be related to the issue https://datastax-oss.atlassian.net/browse/JAVA-1176 where the Cassandra driver has changed the method calls for timestamps but rapidminer is still using the old method. ( On a related note, it would be very useful if we could inject our own custom codecs for cassandra https://docs.datastax.com/en/developer/java-driver/3.3/manual/custom_codecs/ which would allow developers to better handle custom data types injected into cassandra databases).

 

Suggested work arounds welcome Smiley Happy

 

Full details follow:

My cassandra table is defined as;

CREATE TABLE $KEYSPACE$.samples (
context text,
partition int,
resource text,
collected_at timestamp,
metric_name text,
value blob,
attributes map<text, text>,
PRIMARY KEY((context, partition, resource), collected_at, metric_name)
);

 

I can perform a query which works OK for most columns in the table. e.g.

 

select context,  partition,    resource,   metric_name from samples;

 

but if I try to select the timestamp using

 

select  collected_at  from samples;

 

rapidminer throws the following exception

Exception: com.rapidminer.operator.OperatorException
Message: Unknown error. Something went wrong.
Stack trace:
com.rapidminer.extension.nosql.operator.cassandra.ReadCassandraOperator.createExampleSet(ReadCassandraOperator.java:112)
  com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:49)
  com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:33)
  com.rapidminer.operator.io.AbstractReader.doWork(AbstractReader.java:126)
  com.rapidminer.operator.Operator.execute(Operator.java:1004)
  com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:77)
  com.rapidminer.operator.ExecutionUnit$3.run(ExecutionUnit.java:812)
  com.rapidminer.operator.ExecutionUnit$3.run(ExecutionUnit.java:807)
  java.security.AccessController.doPrivileged(Native Method)
  com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:807)
  com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:428)
  com.rapidminer.operator.Operator.execute(Operator.java:1004)
  com.rapidminer.Process.execute(Process.java:1310)
  com.rapidminer.Process.run(Process.java:1285)
  com.rapidminer.Process.run(Process.java:1176)
  com.rapidminer.Process.run(Process.java:1129)
  com.rapidminer.Process.run(Process.java:1124)
  com.rapidminer.Process.run(Process.java:1114)
  com.rapidminer.gui.ProcessThread.run(ProcessThread.java:65)

Cause
Exception: com.datastax.driver.core.exceptions.CodecNotFoundException
Message: Codec not found for requested operation: [timestamp <-> com.datastax.driver.core.LocalDate]
Stack trace:

  com.datastax.driver.core.CodecRegistry.notFound(CodecRegistry.java:741)
  com.datastax.driver.core.CodecRegistry.createCodec(CodecRegistry.java:588)
  com.datastax.driver.core.CodecRegistry.access$500(CodecRegistry.java:137)
  com.datastax.driver.core.CodecRegistry$TypeCodecCacheLoader.load(CodecRegistry.java:246)
  com.datastax.driver.core.CodecRegistry$TypeCodecCacheLoader.load(CodecRegistry.java:232)
  com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3542)
  com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2323)
  com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286)
  com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201)
  com.google.common.cache.LocalCache.get(LocalCache.java:3953)
  com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3957)
  com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875)
  com.datastax.driver.core.CodecRegistry.lookupCodec(CodecRegistry.java:522)
  com.datastax.driver.core.CodecRegistry.codecFor(CodecRegistry.java:485)
  com.datastax.driver.core.CodecRegistry.codecFor(CodecRegistry.java:467)
  com.datastax.driver.core.AbstractGettableByIndexData.codecFor(AbstractGettableByIndexData.java:69)
  com.datastax.driver.core.AbstractGettableByIndexData.getDate(AbstractGettableByIndexData.java:174)
  com.datastax.driver.core.AbstractGettableData.getDate(AbstractGettableData.java:26)
  com.datastax.driver.core.AbstractGettableData.getDate(AbstractGettableData.java:111)
  com.rapidminer.extension.nosql.operator.cassandra.ReadCassandraOperator.createExampleSet(ReadCassandraOperator.java:162)
  com.rapidminer.extension.nosql.operator.cassandra.ReadCassandraOperator.createExampleSet(ReadCassandraOperator.java:97)
  com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:49)
  com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:33)
  com.rapidminer.operator.io.AbstractReader.doWork(AbstractReader.java:126)
  com.rapidminer.operator.Operator.execute(Operator.java:1004)
  com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:77)
  com.rapidminer.operator.ExecutionUnit$3.run(ExecutionUnit.java:812)
  com.rapidminer.operator.ExecutionUnit$3.run(ExecutionUnit.java:807)
  java.security.AccessController.doPrivileged(Native Method)
  com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:807)
  com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:428)
  com.rapidminer.operator.Operator.execute(Operator.java:1004)
  com.rapidminer.Process.execute(Process.java:1310)
  com.rapidminer.Process.run(Process.java:1285)
  com.rapidminer.Process.run(Process.java:1176)
  com.rapidminer.Process.run(Process.java:1129)
  com.rapidminer.Process.run(Process.java:1124)
  com.rapidminer.Process.run(Process.java:1114)
  com.rapidminer.gui.ProcessThread.run(ProcessThread.java:65)

 

 

 

 

 

 

Generate a Random Number (New Number With Each Instance)

$
0
0

Hi there

 

I'm working on a process that creates two random numbers. The following ways have been tested already:

  • I used the "Generate Data" operator and tried to use both the attributes as well the label
  • I used the "Generate Attributes" rand()

I got to a positive result with each scenario. My result is showing me two numbers the way I want, alright. However, if I run the process a second time the numbers do not change (no random new number is generated) - the same numbers are the output of every new instance of the process.

What operator or work-around is to be used in order to get new random numbers for each time I run the process anew?

 

Many thanks in advance!

Roman

Looping through an example to generate more examples

$
0
0

My data is confidential I cannot post it. 

Here is an example of what my data looks like:

|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|

I would like to transform it into:

|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|

|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|

|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|

|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|

|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|

|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|

|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|

 

Please help me do this.

 

Loop Files Operator runs forever

How to work with a very large .csv file?

$
0
0

Hi folks, it looks like there has been a similar post or two in the past, but years old at this point so I thought it would be helpful to refresh...

 

I need to load a huge .csv file (4.72GB, ~23MM lines), and I need to break it up into smaller .csv files according to one polynomial attribute. This is public State of Texas data, so the attribute by which I want to split into smaller data sets is "County", and I want those new .csv files to be kicked out onto my local disk. 

 

What's the best, most computationally efficient way to do this? 

 

Thanks!!

 

cc 


No difference between "polynominal" and "text" data type?

$
0
0

Hi,

 

I checked the manual and this forum but I cannot find the answer.

 

Is there possibly no difference between the "text" and "polynominal" data types in RM?

 

I am asking, because I worked with quite an large data set and RM seems to save text variables the same way it does with polynominals: as categorical variables (or factors).

The metadata file gets really huge and this slows down RM a lot when loading and handling data.

 

Could this be true?

 

Best

Karl

exploratry data analysis on data using chi-square test

$
0
0

 i m getting trouble to find the chi squar test on the dataset. my dataset has real or polynomial values.

here is xml process.

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="retrieve" compatibility="8.1.001" expanded="true" height="68" name="Retrieve house_prices" width="90" x="45" y="34"><parameter key="repository_entry" value="//Local Repository/house_prices"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="weka:W-ChiSquaredAttributeEval" compatibility="7.3.000" expanded="true" height="82" name="W-ChiSquaredAttributeEval" width="90" x="313" y="34"><parameter key="normalize_weights" value="false"/><parameter key="sort_weights" value="true"/><parameter key="sort_direction" value="ascending"/><parameter key="M" value="false"/><parameter key="B" value="false"/></operator></process>

can anyone help me please?

WEB MİNİGN

$
0
0

hi everybody;

Im new on web mining with rapid miner , and i have a project that have to do , please help me !!!

this link is finance yahoo ;      https://finance.yahoo.com/quote/TSLA?p=TSLA 

i just wanna get this site and  learn tesla stocks , positive or negative .

ı have to use this result in my rapid miner project , but like i said , im newon web mining with rapid miner , 

I dont know ho to do , please show me a simple rapid miner project about my problem

Convert Confidence Values to Regular Attribute

$
0
0

I am scoring data and get results in a column called "confidence". Easy enough, but this result data is hidden from use in downstream processes - cannot see the column in generate attributes, set role, select attributes, etc. The only place I see the output is in the yellow results column. Is there an operator to retrieve this confidence data and use it in downstream process steps? I can export to excel and reimport but this seems silly. thanks, Mike

Creating history of stock balace

$
0
0

Hi guys,

Hope you are doing well.

I have been trying to recreate a history of stock balance, but so far, even consulting some people with good experience in rapidminer, I haven’t been able to do what I need. Maybe you guys would be able to help me.

Below is one example set. Basically, I have initial stock on 02/04. Based on despatches and receipts of the previous days I need to calculate the initial stock for 01/04 and 31/03… all the processes I have tried would work with one product code only, but when the date changes to the next product code it gives incorrect values.

 

CodeInitial Stock on 02/04Trx DataTrx TimeTrx TypeAdjust
SAN0110001/04/201821:00Despatched-10
SAN0110001/04/201820:00Received5
SAN0110001/04/201819:00Despatched-7
SAN0110031/03/201817:00Despatched-6
SAN0110031/03/201816:00Despatched-3
SAN0110031/03/201815:00Received7
SAN0212001/04/201821:00Despatched-2
SAN0212001/04/201820:00Received3
SAN0212001/04/201819:00Despatched-5
SAN0212031/03/201817:00Despatched-6
SAN0212031/03/201816:00Despatched-2
SAN0212031/03/201815:00Received6

 

Below is what I expect to get.

SAN01DateInitial Stock
SAN0101/04/201888
SAN0101/03/201886
SAN0201/04/2018116
SAN0201/03/2018114

 

thank you all in advance.

 

 

Viewing all 2122 articles
Browse latest View live